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ABSTRACT 


This  thesis  studies  the  polygraph  Empirieal  Scoring  System  (ESS)  to  determine  its 
potential  use  in  homeland  security  and  the  war  on  terror.  The  research  based  its  analysis 
on  raw  data  previously  collected  by  other  researchers,  who  removed  identifications  from 
the  data  and  subsequently  provided  it  for  study  here.  The  results  are  described  in  regards 
to  criterion  accuracy;  diagnostic  capability;  proportions  of  correct,  errors,  and 
inconclusive  results;  and  the  difference  in  scoring  accuracy  based  upon  participant 
employment  and  experience.  Twelve  scorers  in  three  cohorts  scored  22  You-Phase 
examinations  taken  from  the  Department  of  Defense-confirmed  archives.  One  cohort 
used  the  three-position  test  data  analysis  (TDA)  system,  another  cohort  used  the  seven- 
position  TDA  system,  and  the  final  cohort  used  the  ESS  TDA  system.  All  TDA  systems 
proved  equally  capable  of  diagnostic  ability.  ANOVAs  showed  no  significant  differences 
between  the  distributions  of  ESS  and  transformed  scores.  No  significant  differences  were 
found  in  decision  accuracy  with  correct,  inconclusives,  errors  rates  for  ESS  scores,  and 
those  from  the  other  two  TDA  systems.  That  ESS  can  complement  other  current  hand- 
score  TDA  systems  is  suggested.  However,  that  it  could  supplant  other  TDA  systems  is 
not  confirmable  by  this  study.  Eurther  study  is  recommended. 
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I.  INTRODUCTION 


Physiology  has  been  used  in  the  United  States  in  the  deteetion  of  deeeption  sinee 
World  War  I,  when  the  government  eommissioned  Dr.  William  Marstoni  to  devise  a 
teehnique  to  question  prisoners  of  war  (Alder,  2007).  Intelligenee  offieials  from  the 
National  Researeh  Couneil  sponsored  Marston’s  researeh.  In  his  first  real-world  ease, 
Marston  used  his  teehniques  to  attempt  to  identify  the  eulprit  in  the  theft  of  a  military 
eodebook  from  the  United  States  Surgeon  General’s  offiee.  Although  he  narrowed  the 
field  of  suspeets  to  one,  there  is  no  reeord  that  the  identified  man  was  ever  eharged  or  in 
faet  had  eommitted  the  theft  (Adler,  2007).  Method  more  than  instrumentation  was 
Marston’s  eontribution  to  he  deteetion.  He  believed  that,  by  monitoring  ehanges  in 
systolie  blood  pressure,  verbal  deeeption  eould  be  deteeted.  As  deseribed  by  Ball  and 
Gillespie  on  their  website: 

He  used  a  standard  blood  pressure  euff,  or  sphygmomanometer,  to  take 
measurements  of  systolie  blood  pressure  during  interrogation.  This  was 
the  first  time  anyone  used  any  kind  of  an  instrument  to  deteet  truthfulness 
or  deeeption.  His  method  was  simple.  Take  and  reeord  the  subjeet’s  blood 
pressure,  release  the  euff.  Ask  the  subjeet  a  question.  Take  and  reeord  the 
subjeet’s  blood  pressure  onee  again  to  identify  any  ehanges.  He  ealled  this 
the  “diseontinuous  method”  of  deteeting  deeeption.  (Ball  &  Gillespie 
Polygraph,  n.d.)2 

The  polygraph,  with  this  long  and  eontroversial  history,  has  been  used  at  the 
federal,  state,  and  loeal  levels  for  a  variety  of  purposes  ever  sinee.  These  uses  inelude 
eriminal  eases,  pre-employment  sereening,  informant  and  witness  testing,  and 
eounterintelligenee  purposes  (Warner,  2005).  There  are  26  federal  polygraph  programs 


1  Dr.  William  Marston  was  a  Harvard  psychologist  who  is  likely  better  known  for  the  creation  of  the 
comic  book  character  “Wonder  Woman”  under  the  nom  de  plume  “Charles  Moulton.” 

2  In  a  scholarly  article  on  the  history  of  the  polygraph,  Paul  Trovillo  notes  that  Angelo  Mosso,  an 
Italian  psychologist  who  studied  under  Cesare  Lombroso,  first  experimented  with  a  plethysmograph  to 
study  the  effects  of  fear  on  human  blood  pressure.  These  experiments,  as  well  as  several  that  came  later, 
were  viewed  as  instrumental  in  the  early  study  of  the  polygraph.  In  1895,  Lombroso,  an  Italian  physician, 
psychiatrist,  and  criminologist,  modified  a  medical  instrument  known  as  a  hydrosphygmograph  (similar  to 
a  modem  cardiophymograph)  to  measure  the  blood  pressure  and  pulse  rate  of  a  criminal  suspect  under 
police  interrogation.  This  is  believed  to  be  the  first  application  of  a  mechanical  instrument  for  lie  detection 
(Trovillo,  1972). 
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spread  across  nine  federal  agencies  (see  Table  1  for  a  listing  of  the  polygraph  programs), 
as  well  as  numerous  state  and  local  law  enforcement  agencies. 


Table  1 .  Federal  Agencies  That  Utilize  the  Polygraph^ 


Department  of  Defense 

Non-Department  of  Defense 

Air  Force  Office  of  Special 
Investigations 

Alcohol,  Tobacco  and  Firearms 

Army  Intelligence  Polygraph  Program^ 
Army  Intelligence  Polygraph  Program^ 

Bureau  of  Prisons/Office  of  Internal  Affairs 

Customs  and  Border  Protection/Internal 
Affairs 

Defense  Criminal  Investigative  Service 

Coast  Guard  Investigative  Service 

Defense  Intelligence  Agency 

Central  Intelligence  Agency 

Naval  Criminal  Investigative  Service 

Drug  Enforcement  Administration 

National  Geospatial-Intelligence 

Agency 

U.S.  Department  of  Energy 

National  Reconnaissance  Office 

Eederal  Bureau  of  Investigation 

National  Security  Agency 

Eood  and  Drug  Administration 

U.S.  Army  Criminal  Investigation 
Command 

Homeland  Security  Investigations^ 

Defense  Intelligence  Agency 

Internal  Revenue  Service-Criminal 

Investigation 

Transportation  Security  Administration 

U.S.  Postal  Inspection  Service 

U.S.  Postal  Inspection  Service,  Office  of 
Inspector 

General 

3  As  of  February  3,  2012. 

Formerly  the  United  States  Army  Intelligence  and  Security  Command  (INSCOM). 
^  Formerly  the  United  States  Army  Intelligence  and  Security  Command  (INSCOM). 
^  Formerly  Immigration  and  Customs  Enforcement  (ICE). 
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Department  of  Defense 

Non-Department  of  Defense 

United  States  Secret  Service 

Veteran  Affairs  Office,  Office  of  Inspector 
General 

The  controversy  over  polygraph  validity  and  reliability  is  ongoing,  but  the  utility 
of  the  polygraph  to  obtain  information  is  widely  acknowledged  (Warner,  2005).  In 
homeland  security  and  the  war  on  terror,  the  polygraph  has  many  applications. 
Specifically,  it  has  been  used  by  intelligence  and  other  federal  agencies  for 
counterintelligence  and  espionage  purposes.  Many  agencies  use  it  as  part  of  ongoing 
security  screening  programs  for  current  employees.  The  Central  Intelligence  Agency’s 
(C.I.A.)  Aldrich  Ames  case  and  the  Department  of  Energy’s  Wen  Ho  Lee^  case  are  just 
two  controversial  examples  of  polygraph  use  in  espionage  investigations  at  the  federal 
level.  These  two  cases  exemplify  why  scoring  techniques  are  so  important  to  the  field  and 
why  poor  technique  or  diagnostics  or  lack  of  interrater  reliability  can  be  detrimental  to 
national  security.  The  common  public  perception  is  that  Ames  passed  his  polygraphs 
(Alder,  2007;  Pentagon’s  intelligence  arm,  2008)  while  the  polygraph  was  partially 
responsible  for  the  bungling  of  the  Lee  investigation  (Hoffman  &  Stober,  2001;  Alder, 
2007;  Wen  Ho  Lee’s  Problematic  Polygraph,  2000).  It  is  myth  that  the  polygraph’s 
alleged  failure  allowed  the  two  men  to  continue  their  deception.*  These  cases  raised 
questions  about  the  very  foundation  on  which  the  government  bases  its  use  of  the 
polygraph  for  national  security  purposes.  A  brief  look  at  each  case  will  demonstrate  some 
of  the  issues  of  test  data  analysis,  the  component  of  the  polygraph  process  that  this 
research  studies. 


^  It  is  of  note  that  the  National  Aeademy  of  Seienees  believed  the  Lee  ease  so  important  to  the 
government’s  relianee  on  the  polygraph  that  it  devoted  an  appendix  to  the  ease  in  its  report  {The polygraph 
and  lie  detection,  2003)  and  that  eNotes,  a  popular  researeh  site  for  students  and  teaehers,  uses  it  as  its  ease 
study  for  polygraph  on  its  website  (Lemer  &  Lemer,  2006). 

*  This  statement  is  based  on  personal  interviews  with  primary  sourees  who  eannot  be  identified  due  to 
seeurity  eoneems.  These  personal  eonversations  have  taken  plaee  during  the  25-plus-year  polygraph  eareer 
of  the  author. 
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Ames,  a  Central  Intelligence  Agency  Directorate  of  Operations  officer,  was 
arrested  in  1994  for  selling  information  to  the  Soviet  Union.  According  to  publications, 
Ames  had  spied  for  the  KBG  for  nine  years,  and  his  duplicity  had  resulted  in  the  death  of 
at  least  ten  agents  who  had  spied  for  the  C.I.A.  in  the  Soviet  Union  (Earley,  1997;  C.I.A., 
n.d.).  In  1994,  Dan  Glickman,  the  House  Intelligence  Committee  chairman,  noted  that  the 
Federal  Bureau  of  Investigation  had  concluded  that  Ames  did  not  pass  either  of  two  tests 
(Kleiner,  2002).  Then-C.I.A.  Director  James  Woosley  in  1994  revealed  that  the  F.B.I.  had 
not  properly  investigated  Ames’s  two  failed  polygraphs  (Kleiner,  2002). 

Wen  Ho  Fee,  a  naturalized  United  States  citizen,  became  suspect  as  a  Chinese  spy 
in  1995,  after  his  employer,  the  Department  of  Energy  (DOE),  deduced  that  China  had 
stolen  classified  nuclear  weapons  designs  that  allowed  the  country  to  develop  a 
miniaturized  nuclear  warhead  (Wen  Ho  Fee  Case  Study,  2008).  Fee  had  been  employed 
at  the  DOF’s  Eos  Alamos  National  Faboratory  in  New  Mexico  since  1978  and  later 
became  a  nuclear  weapons  scientist  at  the  laboratory.  During  the  1980s  and  1990s  Fee 
had  numerous  contacts  with  Chinese  officials  and  scientists,  some  on  official  business 
and  others  while  attending  parties  or  conferences  (Hoffman  &  Stober,  2001).  As  part  of 
his  employment.  Fee  was  subject  to  periodic  polygraph  examinations:  one  in  1984,  one  in 
1998,  and  another  in  1999  {Polygraph  and  lie  detection,  2003).  However,  the  results  of 
these  polygraphs  are  in  dispute  {Polygraph  and  lie  detection,  2003;  Wen  Ho  Fee’s 
Problematic  Polygraph,  2000).  More  specifically,  it  is  the  disagreement  between  the 
opinions  rendered  by  the  original  polygraphist  and  other  polygraphists  who  later 
reviewed  the  polygraph  charts  as  to  whether  or  not  Fee  was  truthful  that  creates  the  issues 
of  concern.9  The  interpretation  and  scoring  of  polygraph  charts  is  the  focal  point  of  this 
thesis. 

The  polygraph  is  used  by  federal,  state,  and  local  governments  in  determining  the 
credibility  and  suitability  of  prospective  employees  who  potentially  will  have  a  role  in 
homeland  security  and/or  the  war  on  terror.  The  author  has  primary-source  information 

9  What  does  not  seem  to  be  at  issue  is  that  Lee  illegally  removed  huge  amounts  of  elassified  nuelear 
information  from  the  laboratory,  estimated  at  over  400,000  pages,  and  that  onee  removed,  its  final 
destination(s)  have  never  been  learned  (Shelby,  2001). 
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that  polygraph  pre-employment  screening  in  a  major  city  law  enforcement  agency 
uncovered  two  attempted  infiltrations,  one  by  a  Chinese  operative  and  the  second  by  a 
member  of  Al-Qaeda,  i*’  In  the  case  of  the  Chinese  operative,  the  agent  was  to  gain 
employment  at  a  law  enforcement  agency  and  work  there  long  enough  to  establish  a 
record  of  credibility  in  order  to  later  become  an  employee  of  a  federal  law  enforcement 
agency.  In  the  case  of  the  Al-Qaeda  affiliated  applicant,  the  effort  was  just  an  attempt  to 
infiltrate  law  enforcement  in  a  major  city,  one  with  a  large  Muslim  population. 

The  polygraph  has  been  used  in  Guantanamo  Bay,  Kandahar,  Bagram,  and  other 
front-line  combat  theatres.  In  September  2003,  the  Air  Force  Office  of  Special 
Investigations  (AFOSI)  deployed  its  first  full-time  polygraphist  to  Baghdad  (Collins, 
2004,  p.  1).  Prior  to  this,  the  Air  Force  had  deployed  polygraphists  on  temporary  duty 
(TDY).  The  (then-)  polygraph  program  manager.  Special  Agent  Pat  Muller,  was  quoted 
as  saying,  “The  polygraph  exams  we  have  administered  over  there  have  been  some  of  the 
most  critical  and  important  work  we  have  ever  done  in  this  program”  (Collins,  2004,  p. 
1).  The  scope  of  examinations  in  the  theatre  of  war  includes  vetting  coalition  force 
members,  determining  the  veracity  of  prisoners  and  informants  on  whose  information 
tactical  operations  are  initiated,  and  assisting  in  the  conduct  of  criminal  investigations 
(Collins,  2004,  p.  1). 

A.  PROBLEM  STATEMENT 

Information  provided  to  decision  makers  should  be  as  accurate,  trustworthy,  and 
robust  as  possible,  and  it  is  clear  that  the  polygraph  plays  an  important  role  in  achieving 
these  requirements.  Each  day  decision  makers  in  federal,  state,  and  local  governments 
rely  on  the  results  of  polygraph  examinations  to  make  their  decisions.  In  its  2002 


Due  to  the  elassifieation,  sensitivity,  and  state  eivil-serviee  rules,  neither  the  name  of  the  ageney  nor 
the  minute  details  ean  be  divulged. 
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Polygraph  Program  Annual  Report  to  Congress  (Department  of  Defense,  2003),  the 
Department  of  Defense  (DoD)  reported  that  it  had  eondueted  11,566  polygraph 
examinations. 

The  possible  results  of  a  polygraph  examination  are:  “no  deeeption  indieated” 
(passed),  “deeeption  indieated”  (failed)  or  “inconclusive”  (the  tracings  were  such  that  no 
opinion  can  be  rendered).  These  judgments  are  rendered  using  one  of  several  scoring 
mechanisms.  None  of  these  manual  scoring  systems  in  common  use  will  deliver  error 
estimates  except  ESS.  That  is  to  say,  there  is  no  current  scoring  mechanism  that  allows 
the  polygraphist  or  the  consumer  to  compare  a  calculated  probability  of  error  to  a  stated 
tolerance  for  error  (Handler  et  ah,  2010).  The  p-value  maps  the  scores  over  a  probability 
distribution  such  that  the  consumer  can  estimate  the  error  likelihood  of  a  decision  based 
on  the  scores.  These  error  estimates  allow  the  consumer  to  take  a  more  informed  value 
judgment  about  tolerance  for  risk  or  error.  Other  current  scoring  systems  in  use  do  not 
have  the  same  empirical  level  of  decision  accuracy  as  ESS  (Handler  et  ah,  2010).  ESS 
provides  accuracy  profdes  to  include  the  total  proportion  of  correct,  inconclusive, 
deceptive,  truthful,  sensitivity,  specificity,  false  negative  errors  (liars  called  non- 
deceptive),  and  false  positive  errors  (truthful  called  deceptive). 

This  has  been  the  state  of  the  profession  since  the  early  use  of  the  polygraph, 
much  to  the  derision  of  its  critics.  The  employment  of  an  empirically  based  scoring 
mechanism  would  allow  polygraphists  to  render  an  opinion  based  upon  confidence  in  a 
scientifically  derived  result.  The  questions  therefore  become:  does  the  scoring 
mechanism  that  provides  that  p-value  have  at  least  the  same  or  better  accuracy  profiles  as 
current  scoring  mechanisms;  how  can  it  be  applied,  and  would  it  be  accepted? 


1 1  This  is  the  final  report  made  to  Congress,  sinee  Congress  relieved  the  DoD  of  its  reporting 
responsibilities  after  fiscal  year  2002.  No  current  figures  are  available  as  to  the  number  of  examinations 
currently  conducted  by  the  DoD.  In  1991,  Congress  authorized  the  DoD  to  conduct  no  more  than  5,000 
CSP  examinations  annually.  However,  this  quota  was  lifted  in  2005,  and  there  is  currently  no  cap  on  CSP 
examinations.  The  figure  of  8,512  includes  those  conducted  by  the  DoD  for  non-DoD  federal  agencies.  It  is 
also  noted  that  these  numbers  include  only  the  DoD  and  not  the  National  Security  Agency  or  those 
conducted  under  the  authority  of  the  director  of  Central  Intelligence. 
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B,  RESEARCH  QUESTION 

The  broad  question  under  consideration  is  whether  the  accuracy  profiles 
associated  with  various  scoring  techniques  should  have  an  impact  on  the  technique 
chosen  in  the  homeland  security  arena.  Additional  questions  to  support  this  analysis  will 
include:  1)  Are  there  differences  in  the  effectiveness  of  the  three-position,  seven-position, 
and  ESS  test  data  analysis  (TDA,  chart  interpretation)  models  at  extracting  diagnostic 
information  from  the  raw  data,  as  reflected  by  the  distributions  of  numerical  scores?  2) 
Are  there  significant  differences  in  criterion  accuracy  for  the  three-position,  seven- 
position,  and  ESS  TDA  models?  3)  What  is  the  effect  on  accuracy  of  transforming  three- 
position  and  seven-position  scores  to  ESS  scores?  4)  How  accurate  are  the  combined 
three-position,  seven-position,  and  ESS  results?  How  accurate  are  the  combined  results 
when  all  scores  are  transformed  to  ESS  scores?  Is  the  difference  significant?  5)  Are  there 
differences  in  accuracy  that  can  be  attributed  to  experience?  Does  more  experience  result 
in  increased  accuracy?  6)  Does  accuracy  vary  with  the  examiner’s  type  of  employment? 
Are  there  differences  between  private  examiners  and  those  who  work  for  government 
(law  enforcement/federal  government)  agencies? 
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II.  LITERATURE  REVIEW 


The  polygraph  has  demonstrated  an  important  role  in  homeland  seeurity  and  the 
war  on  terror.  This  role  has  included  the  screening  of  personnel  within  many  federal, 
state,  and  local  agencies  across  the  United  States  to  assist  in  ensuring  that  prospective 
hires  do  not  have  an  illicit  motive  for  joining  the  ranks.  It  is  not  only  important  to 
understand  that  criminals  and  terrorists  alike  have  attempted  and  been  successful  in  acts 
that  threaten  homeland  security,  but  enemies  of  the  nation  have  the  intent  to  spy  and/or 
recruit  potential  agents  for  the  purposes  of  espionage  within  our  intelligence  agencies  and 
throughout  other  levels  of  government.  The  polygraph  was  used  in  World  War  I  in 
counter-intelligence  operations.  It  gained  greater  and  more  specific  use  in  Korea  (Alder, 
2007).  Since  then,  it  has  been  used  to  assist  decision  makers  in  taking  strategic  and 
tactical  decisions  that  directly  protected  American  assets  and  lives  as  well  as  those  of  our 
allies.  This  review  will  identify  literature  about  polygraph  scoring  techniques  that  are 
currently  relevant  to  the  topic,  as  well  as  those  that  will  enhance  the  reader’s 
understanding  of  the  field  of  the  polygraph. 

This  literature  review  will  address  three  areas  related  to  the  polygraph  and  hand¬ 
scoring  techniques.  The  first  section  will  give  a  brief  overview  of  polygraph  research. 
The  second  section  will  provide  an  overview  of  types  of  testing  techniques.  Finally,  the 
third  section  will  discuss  research  related  to  scoring  techniques. 

A,  POLYGRAPH  RESEARCH 

Polygraph  expert  and  researcher  Stuart  M.  Senter  claims  that  polygraph 
examination  is  an  inimitable  field:  “Polygraph  examiners  are  trained  to  accomplish  a  task 
that,  in  the  mind  of  the  public,  should  only  be  made  possible  through  rapid  advances  in 
seemingly  futuristic  technological  equipment  or  through  the  weaving  of  mystical  powers 
thought  to  be  proffered  by  wizards  and  magicians.”  In  other  words,  Senter  is  implying 
that  many  consider  polygraph  nothing  but  a  magic  trick,  subject  to  ridicule  and  derision 
(Senter,  2008).  Senter  goes  on  to  note  that  providing  a  more  pragmatic  view  of  the 
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polygraph  will  be  accomplished  through  increasing  the  body  of  knowledge  about  the 
field.  To  date,  the  research  has  focused  on  applied  research.  That  is,  it  focuses  on  real- 
world  problems  and  tends  to  ignore  theoretical  knowledge.  However,  the  basic 
foundations  of  polygraph  principles  have  been  ignored.  There  is  little  work  on  the 
understanding  of  factors  that  look  at  the  diagnostic  value  of  the  polygraph  (Senter,  2008, 
p.  278). 

The  National  Research  Council  points  out  that  there  must  be  a  solid  theoretical 
base  to  have  confidence  in  polygraph  tests,  lest  erroneous  results  in  populations  such  as 
“spies  and  terrorists”  fail  national  security  {Polygraph  and  lie  detection,  2003,  p.  92). 
However,  the  field  has  not  made  proper  use  of  theoretical  systems  about  the  processes 
that  underlie  the  measurements  taken  by  the  polygraph  {Polygraph  and  lie  detection, 
2003,  p.  93).  Further,  the  research  on  the  concept  of  decision  thresholds  (which  are  part 
of  scoring  techniques)  has  largely  been  ignored  in  polygraph  research. 

The  consensus  is  that,  although  improving,  in  order  to  bring  the  polygraph  into 
the  realm  of  a  recognized  science,  robust  research  must  continue  to  be  pursued. 

B,  TESTING  TECHNIQUES 

This  section  is  not  an  exhaustive  overview  of  all  testing  techniques  in  use  in  the 
field  of  polygraph  examination.  It  is  a  literature  review  of  sources  pertaining  only  to  the 
most  common  techniques  currently  being  utilized.  Donald  Krapohl  and  Shirley  Sturm,  in 
their  2002  article  in  Polygraph  identify  a  number  of  testing  techniques.  The  Air  Force 
Modified  General  Question  Test  is  a  single-issue,  multiple-issue,  or  multi-facet  technique 
(Krapohl  &  Sturm,  2002b).  The  Comparison  Question  Technique  is  a  term  applied  to  a 
number  of  test  formats  that  use  probable-  or  directed-lie  test  questions.  A  Concealed 
Information  Test  is  a  type  of  test  that  involves  a  series  of  tests  in  which  one  critical  item 
is  used  in  each  series.  The  intent  of  the  test  is  to  determine  the  person’s  knowledge  of  the 
particular  item.  A  Counterintelligence-Scope  Polygraph  (CSP)  is  a  type  of  test  given  to 
federal  government  employees  who  have  access  to  sensitive  security  information.  The 
CSP  is  designed  to  “detect  and  deter  espionage,  security  breaches,  sabotage,  or  other  acts 
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against  the  government”  (Krapohl  &  Sturm,  2002a,  p.  172).  A  test  format  that  is  widely 
used  in  the  field  is  known  as  the  Modified  General  Question  Test  (MGQT).  The  MGQT 
eonsists  of  more  relevant  questions  than  eomparison  questions.  It  does  not  use  what  is 
known  as  a  “symptomatic  question.”i2  A  Modified  Relevant/Irrelevant  Technique  is  a 
specific-issue  test  that  uses  situational  comparison  questions,  which  are  then  compared  to 
the  relevant  questions.  The  relevant/irrelevant  technique  is  a  family  of  test  formats  that 
forgo  the  use  of  a  traditional  comparison  question.  They  are  most  widely  used  in 
screening  tests.  U.S.  government  agencies  use  a  test  know  as  the  Test  for  Espionage  and 
Sabotage,  which  is  a  multi-issue  screening  test  typically  used  with  government 
employees  who  have  access  to  sensitive  information  or  programs/projects.  The  Utah 
Technique  is  a  technique  that  uses  modules  of  questions  that  consist  of  a  comparison, 
relevant,  and  irrelevant  question.  The  You  Phase  is  a  single-issue  test  in  which  the 
relevant  question  is  slightly  varied  throughout  the  test.  It  is  a  highly  focused  test.  The 
Zone  Comparison  Test  (ZCT)  uses  three  zones  that  refer  to  categories  of  questions 
(relevant,  comparison,  and  symptomatic)  that  then  compare  two  of  the  zones  (relevant 
and  comparison)  to  determine  whether  the  examinee  was  truthful  or  deceptive.  It  is 
designed  to  “focus  their  attention  to  specific  zone  question(s).  It  is  the  first  modem 
polygraph  technique  to  which  numerical  analysis  was  widely  applied”  (Krapohl  &  Sturm, 
2002b). 


C.  SCORING  TECHNIQUES 

The  global  evaluation  technique  is  one  in  which  the  polygraphist  visually  inspects 
the  charts  to  determine  whether  there  is  a  stronger  response  to  the  relevant  questions.  It  is 
most  commonly  used  to  score  the  Relevant/Irrelevant  Technique  (RI  Technique).  The 
NRC,  as  well  as  Krapohl  and  Dollins,  notes  that  there  is  a  lack  of  standardization  to  the 


12  A  “symptomatic  question”  is  a  question  used  to  identify  whether  or  not  an  examinee  is  fearful  that 
the  polygraphists  will  ask  an  unreviewed  question  embracing  an  outside  issue  that  is  bothering  the 
examinee.  This  mistrust  of  the  examiner  will  putatively  dampen  the  examinee’s  responses  to  other  test 
questions.  Symptomatic  questions  are  widely  used,  though  the  trend  in  the  research  is  that  there  is  no 
meaningful  effect  (Krapohl  &  Sturm,  2002a). 
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scoring  technique  and  that  it  has  numerous  idiosynerasies  {Polygraph  and  lie  detection, 
2003;  Krapohl  &  Dollins,  2003).  Literature  on  this  teehnique  is  seant,  and  its  general  use 
has  declined. 

The  teehnique  favored  by  most  eurrent  polygraphists  is  numerical  scoring  in  its 
several  variations.  The  introduction  of  numerical  scoring  for  the  Comparison  Question 
Teehnique  is  attributed  to  Cleve  Baekster,  a  well-known  school  director  and  instructor  in 
modem  polygraph  teehniques  (Weaver,  1980).  He  introduced  the  seven-position  scoring 
TDA  system.  The  seale  assigns  seores  ranging  between  +3  and  -3  to  the  respeetive 
questions  and  their  “comparison”  questions.  Weaver  notes  that  the  seoring  teehnique  was 
first  developed  by  Baekster  to  assist  students  in  ehart  analysis  in  elassroom  settings.  In 
later  researeh  condueted  by  the  University  of  Utah,  it  was  concluded  that  numerical 
scoring  had  higher  rates  of  aeeuraey  and  reliability  than  other  scoring  techniques  (Raskin, 
Barland,  &  Podlesney,  1978),  and  it  became  the  benchmark  for  the  profession.  The 
seoring  system  has  evolved  to  inelude  a  three-position  TDA  system.  This  scoring  system 
is  now  in  wide  use  by  polygraphists. 

Krapohl  and  Dollins  undertook  what  they  deseribed  as  a  mdimentary 
investigation  of  the  three  primary  seoring  mle  systems  that  can  be  applied  to  these 
numerieal  seoring  teehniques  (Krapohl  &  Dollins,  2003).  The  three  seoring  systems  are 
known  as  the  Utah,  the  Baekster,  and  the  federal  scoring  systems.  These  seoring  systems 
have  three  eommon  eomponents:  seoring  mles,  eomputation  mles,  and  deeision  mles  (out 
scores)  (Krapohl  &  Dollins,  2003,  p.  150).  It  is  important  to  understand  these  three  terms 
as  used  in  the  literature  as  they  will  be  explored  further  as  part  of  this  researeh.  Seoring 
mles  are  those  that  relate  to  the  ohoioe  of  traoing  features  in  the  oharts,  rejeotion  of 
artifacts,  and  the  ohoice  of  how  question  pairs  are  oompared  and  numbers  assigned  to  the 
soheme.  The  weight  and  how  the  numbers  are  oombined  desoribe  the  eomputation  mles. 
Decision  mles,  otherwise  known  as  cut  scores,  govern  the  relationship  between  the 
eomputation  mles  and  the  polygraphist’s  choiee  of  a  deeision  (opinion),  which  will  either 
be  Deeeption  Indieated  (DI),  No  Deeeption  Indieated  (NDI)  or  inconelusive  (INC) 
(Krapohl  &  Dollins,  2003). 
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Decision  rules  predominated  in  conclusions  reached  by  the  NRC  and  Krapohl, 
Stern  and  Bronkema  {Polygraph  and  lie  detection,  2003;  Krapohl,  Stern,  &  Bronkema, 
2009).  Specifically,  each  came  to  the  conclusion  that  risk  tolerance,  and  the 
corresponding  decision  rules,  should  be  set  by  the  consumer  of  polygraph  results.  That  is, 
this  decision  should  not  be  left  to  the  polygraphists\  but  to  the  consumer  of  the  results, 
who  ultimately  decides  what  risk  can  be  accepted  in  the  decision  making  process.  In 
short,  the  determination  of  decision  rules  is  a  policy  decision  and  will  come  into  play 
later  in  the  discussions  of  this  research. 

Two  things  become  apparent  in  the  literature:  Those  who  speak  to  the  topic  agree 
on  the  paucity  of  research  into  the  polygraph,  and  some  note  that  the  research  concerning 
scoring  techniques  is  even  rarer.  Secondly,  the  research  into  hand-scoring  techniques 
looks  into  many  things.  Prior  research  includes  accuracy  and  reliability  of  the  scoring 
technique  and  the  relative  simplicity  or  lack  of  it  within  the  respective  technique  and 
interrater  reliability.  What  prior  research  lacks  is  the  incorporation  of  the  study  of 
normative  data  (Handler  et  ah,  2010). 

Another  scoring  technique — the  topic  of  this  research — is  the  Empirical  Scoring 
System  (ESS).  This  scoring  system  was  first  described  by  Krapohl,  Nelson,  and  Handler 
in  2008  (Krapohl,  Nelson,  &  Handler,  2008).  The  development  and  research  conducted 
on  ESS  allowed  for  the  first  time  in  the  development  of  a  polygraph  hand-scored 
technique  the  application  of  p-values  and  normative  data.  It  is  profound  in  its  simplicity, 
and  based  on  associated  p-value  tables  in  regard  to  specificity,  sensitivity,  and 
inconclusive  rate,  the  decision  maker  or  policy  setter  can  compare  the  probability  of  error 
and  choose  the  error  rate  that  best  fits  into  his  schema  for  risk  aversion.  It  is  because  of 
this  unique  ability,  in  conjunction  with  the  simplicity  of  its  use,  that  ESS  may  prove  to  be 
the  most  robust  scoring  technique  and  capable  of  protecting  American  lives  and  assets  at 
home  and  in  the  field  of  combat. 
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III.  HYPOTHESES  OR  TENTATIVE  SOLUTIONS 


The  polygraph  is  used  in  many  cireumstances  for  the  purposes  of  national 
security,  as  well  as  law  enforcement  and  security  issues  at  the  state  and  local  levels.  Its 
use  in  combat  zones  as  well  as  the  rear  areas  in  theatres  of  war  is  documented.  It  has 
proven  to  be  an  extremely  useful  tool  by  assisting  decision  makers  in  the  field  to  make 
both  strategic  and  tactical  decisions.  The  claim  is  that,  by  providing  polygraph  experts 
with  a  simpler  hand-scoring  technique,  based  on  empirical  data  to  which  probability 
values  have  been  determined,  they  in  turn  can  provide  these  decision  makers  with  a  more 
informative  answer  to  the  questions  at  hand.  In  the  combat  arena,  those  questions  can 
revolve  around  whether  or  not  to  undertake  a  tactical  operation  based  on  the  word  of  an 
informant,  collaborator,  or  captured  enemy  combatant.  Such  decisions  involve  great  risks 
to  life  and  limb,  and  the  decision  makers  must  be  given  the  best  tools  available  to  make 
them.  In  other  homeland  security  concerns,  they  can  involve  the  credibility  of  informants, 
witnesses,  accused  or  suspected  criminals,  spies,  and  other  ne’er-do-wells. 

Evidence  to  support  this  claim  can  be  found  in  the  review  conducted  by  the 
National  Research  Council  (NRC).  The  NRC  notes  that 

decision  scientists  and  policy  advisers  have  worked  to  develop  systematic 
methods  for  resolving  hard  decision  problems  that  arise  in  business, 
medicine  and  public  policy.  These  methods  are  used  explicitly  in  many 
scientific  articles,  and  they  are  used  implicitly  in  practical  advice,  where 
the  goal  is  to  get  decision  makers  to  think  systematically  before  acting. 
{Polygraph  and  lie  detection,  2003,  p.  358) 

The  history  of  the  polygraph  is  such  that  the  lack  of  a  sound  scientific  basis,  in  the 
minds  of  some,  has  led  to  the  dismantling  of  various  polygraph  programs,  caused 
decision  makers  to  be  reluctant  to  rely  on  it — even  in  the  absence  of  alternatives — and 
caused  much  conversation  in  the  halls  of  Congress,  state  houses,  and  local  government 


No  polygraph  programs  have  been  dismantled  at  the  federal  level,  and  new  federal  programs  have, 
in  faet,  been  added  sinee  the  2003  NRC  report.  However,  legislative  deeisions  and  eourt  mles  have 
impaeted  or  outlawed  polygraph  programs  at  the  state  and  loeal  levels. 


15 


buildings  as  to  its  usefulness.  It  is  a  proven  tool  in  the  war  on  terror  and  national  seeurity. 
The  Empirieal  Seoring  System  is  the  simplest  hand-scoring  technique  to  have  empirical 
and  scientific  support  as  its  foundation. 
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IV.  SIGNIFICANCE  OF  RESEARCH 


A.  LITERATURE 

There  is  a  dearth  of  literature  on  seientific  and  empirically  based  hand-scoring 
techniques  in  the  field  of  polygraph  examination,  particularly  the  impact  of  the 
techniques  used  on  the  robustness  of  decisions  taken  by  those  who  rely  on  the  polygraph 
to  assist  them  in  their  decision  making  process.  The  Empirical  Scoring  System  is  one  of 
the  first  and  simplest  hand-scoring  techniques  with  intent  to  anchor  TDA  on  empirical 
evidence  and  scientific  study  (Handler  et  ah,  2010).  This  research  should  not  only  impact 
the  use  of  the  polygraph  as  it  relates  to  national  security,  homeland  security,  and  the  war 
on  terror,  but  it  should  further  the  scientific  advancement  in  the  polygraph  community  as 
a  whole. 

B,  FUTURE  RESEARCH  EFFORTS 

This  research  will  reinforce  the  concept  that  a  solid  scientific  basis  for  the 
polygraph  will  enhance  its  use  and  make  it  more  readily  defensible.  The  National 
Research  Council  (NRC)  has  stated  that  no  lie  detection  technique  has  been  shown  to 
outperform  the  polygraph  and  none  shows  any  promise  in  the  near  term  {Polygraph  and 
lie  detection,  2003,  p.  173).  However,  it  also  notes  that  past  efforts  at  polygraph  research 
have  not  laid  a  sound  foundation  of  scientific  knowledge  in  the  field  {Polygraph  and  lie 
detection,  2003,  p.  213).  On  page  221  of  its  review  {Polygraph  and  lie  detection,  2003), 
the  NRC  goes  on  to  say  that  the  detection  of  deception  and  information  withholding  is 
important  to  national  security  and  that  “government  agencies  will  continue  to  seek 
accurate  ways  to  detect  deception  by  criminals,  spies,  terrorists,  and  others  who  threaten 
public  safety  and  security  interests.”  This  thesis  is  just  one  small  part  of  this  effort,  and  it 
is  hoped  that  it  encourages  others  in  the  field  and  those  who  are  consumers  of  its  product 
to  engage  in  further  scientific  study,  particularly  as  it  relates  to  security  on  the  national, 
state,  and  local  levels. 
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C.  CONSUMERS 

The  immediate  eonsumers  of  this  researeh  are  the  Department  of  Defense  and  its 
various  military  branehes,  as  well  as  all  federal  agencies  that  have  polygraph  programs  in 
place  as  part  of  their  national  and  internal  security  interests.  Further,  all  state  and  local 
law  enforcement  and  criminal  justice  agencies  who  rely  on  polygraph  results  as  part  of 
their  decision-making  process  should  find  this  research  useful.  It  is  anticipated  that  the 
three  national  polygraph  associations — the  American  Association  of  Police 
Polygraphists,  the  American  Polygraph  Association,  and  the  National  Polygraph 
Association — will  utilize  this  research  in  the  training  and  education  of  their  respective 
members. 

D,  HOMELAND  SECURITY  PRACTITIONERS  AND  LEADERS 

NATIONALLY 

This  research  should  be  of  interest  to  many  federal  program  managers  within 
DHS  and  various  federal  agencies  outside  DHS,  both  those  who  use  the  polygraph  and 
others  who  may  not  for  various  reasons.  As  this  is  just  one  small  step  in  an  effort  to  roll  a 
component  of  lie  detection  onto  a  sound  scientific  basis,  it  can  be  anticipated  that  those 
who  have  been  reluctant  to  utilize  the  polygraph,  or  perhaps  even  those  who  have  been 
detractors  of  the  field,  might  be  encouraged  and  convinced  to  reconsider  their  positions. 
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V.  METHOD 


The  present  researeh  based  its  analysis  on  raw  data  previously  colleeted  by  other 
researchers  instrumental  in  the  development  of  the  Empirical  Scoring  System  (ESS),  who 
removed  identifiers  from  the  data  and  subsequently  provided  it  for  study  here. 

Data  was  obtained  from  three  groups  (Cohorts  1,  2,  and  3)  of  four  scorers  each. 
These  participants  were  randomly  grouped  volunteers  from  a  group  of  300  students 
trained  in  the  Empirical  Scoring  System  as  part  of  a  training  seminar  hosted  by  the 
American  Association  of  Police  Polygraphists^^  in  Cambridge,  Massachusetts,  on  March 
28,  2011.  Cohort  #1  scored  the  sample  examinations  using  the  Empirical  Scoring  System 
(see  Appendix  A).  Cohort  #2  scored  the  examinations  with  the  three-position  Test  Data 
Analysis  (TDA)  system  (DACA,  2006)  (see  Appendix  B),  and  Cohort  #3  scored  them 
using  the  seven-position  TDA  system  (DACA,  2006)  (see  Appendix  C). 

The  Empirical  Scoring  System  is  an  evidence-based  numerical  hand-scoring 
technique  used  for  test  data  analysis  of  polygraph  charts  obtained  from  comparison 
question  tests  (Nelson  et  ah,  2012).  The  ESS  system  utilizes  a  three-position  scale  of  +,  0, 
or — and  relies  on  the  bigger-is-better  rule;i^  scores  are  assigned  when  the  scorer  visually 
observes  a  difference  in  reaction  strength  between  relevant  and  comparison  questions 
(Nelson  et  ah,  2012).  A  positive  score  (+)  is  assigned  when  there  is  a  larger  response  to  a 
comparison  question,  and  a  negative  (-)  score  is  assigned  when  there  is  a  larger  response 
to  a  relevant  question.  In  typical  comparison-question  test  formats,  relative  questions  are 
normally  compared  to  comparison  questions  (Nelson  et  ah,  2012). 


The  American  Association  of  Police  Polygraphists  is  the  largest  law  enforcement  polygraph 
association  in  the  world.  The  author  is  both  a  past  and  current  president. 

The  instructions  for  the  rule  are  simple:  if  you  can  see  it,  point  to  it,  and  support  that  the  reaction  is 
bigger,  then  you  score  it.  If  you  can’t  point  to  it  and  support  it,  then  do  not  assign  a  score. 


19 


In  “Terminology  Reference  for  the  Science  of  Psychophysiological  Detection  of 
Deception”^^  (Krapohl  &  Sturm,  2002b),  the  seven-  and  three-position  TDA  systems  are 
defined  as  follows: 

7-position  scale 

System  of  assigning  values  to  individual  physiological  responses  in  PDD, 
based  on  differential  responding  to  relevant  and  comparison  questions. 

The  values  in  7-position  scoring  are  whole  numbers  between  -3  and  +3. 

By  convention,  negative  values  represent  greater  responding  to  relevant 
questions,  while  positive  values  indicate  greater  responses  to  comparison 
questions.  A  zero  usually  indicates  equal  or  no  reactions  to  the 
relevant  and  comparison  questions,  or  that  the  spot  does  not  meet 
minimum  standards  for  interpretation.  The  assigned  numbers  are  summed 
across  all  three  PDD  parameters  for  each  question  for  all  spots  and  all 
charts.  There  are  thresholds  for  determinations  of  truthfulness  or 
deception,  with  an  inconclusive  region  separating  them.  In  the  PDD 
literature,  the  7-position  scale  is  sometimes  referred  to  as  a  semi-objective 
scoring  system.  There  are  three  major  versions  of  the  7-position  scoring 
system:  Backster,  Utah,  and  DoDPI.  See:  Bell,  Raskin,  Honts,  &  Kircher 
(1999);  Swinford  (1999);  Weaver  (1985) 

3-position  scale 

Abbreviated  form  of  the  7-position  scale  for  PDD  test  data  analysis.  The 
major  difference  is  that  the  range  of  values  for  each  comparison  is  from  -1 
to  +1,  rather  than  the  range  of  -3  to  +3  in  the  7-position  scoring  system. 

See:  Capps  &  Ansley  (1992);  Krapohl  (1998);  Van  Herk  (1990) 

The  analysis  method  applied  to  the  research  questions  was  an  analysis  of  variance 
(ANOVA),  which  will  be  further  described  for  each  research  question, 


“Psychophysiological  detection  of  deception”  (PDD)  is  a  term  used  primarily  by  the  federal 
government  and  is  interchangeable  with  the  terms  “polygraph”  and  “he  detector.” 

Special  gratitude  is  expressed  to  Raymond  Nelson  for  his  computational  assistance. 
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VI.  RESULTS  AND  ANALYSIS 


Each  study  participant  provided  a  demographic  data  form  (see  Appendix  D).  This 
demographie  data  included  age  and  experience  as  a  polygraphist,  as  well  as  gender. 

The  average  age  was  54,  with  a  standard  deviation  of  three.  The  maximum  age 
was  65,  and  the  minimum  age  was  37.  The  median  age  was  58.  Ages  do  not  appear 
normally  distributed. 

There  were  ten  males  and  two  females.  Females  n=2  is  too  small  for  analysis. 
Compared  to  groups  of  equal  size,  differenees  in  the  group  size  is  signifieant.  Z=9.334 
(p<.001)  Test  of  Proportions.  Gender  was  not  evaluated  as  an  independent  variable  in  the 
remainder  of  the  analysis. 

The  average  years  of  experienee  were  15,  with  a  standard  deviation  of  three. 
Median  experience  was  14  years,  and  the  mode  was  also  14  years.  The  maximum  years 
of  experienee  were  33,  and  the  minimum  was  three.  Proximity  of  the  mean,  median,  and 
mode  indieated  no  inereased  eoncerns  regarding  the  normality  of  the  distribution  of 
partieipant  ages. 

The  partieipants  in  the  study  ineluded  four  private  examiners,  seven  law 
enforcement  examiners,  and  one  federal  examiner. 

Additional  data  collected  on  the  hand-score  sheets  were  the  individual  seores 
assigned  by  the  participant  to  the  two  relevant  questions  on  the  three  charts  of  each 
examination  in  the  study  sample.  A  seore  was  assigned,  according  to  the  struetured  rubric 
for  each  scoring  system,  for  the  traeings  of  eaeh  of  these  sensors:  pneumograph, 
eleetrodermal  (EDA),i9  and  eardiograph.^o  Subtotal  seores  were  calculated  for  each  of 

The  pneumograph  sensors,  one  tube  placed  around  the  abdomen  and  another  around  the  thorax, 
record  respiration  data.  Features  included  in  the  manual  scoring  model  pertain  primarily  to  suppression  or 
reduction  of  respiration  activity. 

Changes  in  the  electrical  properties  of  the  skin  (exosomatic  and  endosomatic)  typically  measured  by 
placement  of  electrodes  on  the  central  pad  of  skin  of  two  fingers.  This  term  superseded  the  term  “galvanic 
skin  response”  (GSR),  which  can  still  occasionally  be  found  in  the  older  literature. 

A  term  for  recording  heart  activity,  typically  done  by  placement  of  a  blood  pressure  cuff  on  an  arm, 
which  then  measures  pulse  wave  and  changes  in  relative  arterial  blood  pressure.  In  this  context  it  is  more 
correctly  called  sphygmo graph  or  plethysmography  (Krapohl  &  Sturm,  2002b). 
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the  relevant  questions,  and  a  grand  total  was  ealeulated  for  the  test  as  a  whole.  Seores 
were  then  interpreted  using  structured  decision  rules,  according  to  the  requirements  of 
each  scoring  method,  to  make  categorical  determinations  as  to  no  deception  indicated 
(NDI),2i  deception  indicated  (DI),22  or  inconclusive  (INC).23  Inconclusive  is  sometimes 
referred  to  as  “no  opinion”  or  “indefinite.”  Each  participant  then  rendered  his  personal 
confidence  level  in  the  opinion  rendered  (see  Appendix  E). 

A.  RESEARCH  QUESTION  #1 

1.  Results 

Do  differences  exist  in  the  effectiveness  of  the  three-position,  seven-position,  and 
ESS  test  data  analysis  systems  at  extracting  diagnostic  information  from  the  raw  data?24 

It  is  important  to  understand  that  the  end  result  of  any  polygraph  examination, 
whether  for  event-specific  criminal  investigations,  security  screening,  law  enforcement 
pre-employment,  or  postconviction  supervision  of  convicted  offenders,  is  a  set  of  tracings 
(charts)  that  can  be  systematically  analyzed  to  make  determinations  of  truthfulness  or 
deception  at  rates  that  are  greater  than  can  be  obtained  by  other  methods.  Other 
professions,  such  as  medicine  and  education,  use  both  diagnostic  and  screening  methods 
in  their  respective  fields.  The  scientific  work  that  has  been  applied  to  these  methods  can 
also  be  applied  to  polygraph  examination  {Polygraph  and  lie  detection,  2003).  Among 
consumers  of  the  information  in  both  the  medical  and  educational  testing  methods,  there 
is  a  general  implicit  understanding  that  test  results  are  helpful  to  professional  decision 


21  No  deception  indicated,  in  layman’s  terms,  means  that  it  is  the  polygraphist’s  opinion  that  the 
person  is  truthful  as  to  the  matter  at  hand. 

22  Deception  indicated,  in  layman’s  terms,  means  that  it  is  the  polygraphist’s  opinion  that  the  person  is 
not  truthful  (lying)  to  the  matter  at  hand. 

23  Inconclusive,  in  layman’s  terms,  means  that  the  polygraphist  has  no  opinion  as  to  whether  or  not  the 
person  is  truthful  or  lying  to  the  matter  at  hand.  It  is  typical  that  “no  opinion”  is  rendered  when  the 
diagnostic  quality  of  the  tracings  is  such  that  they  cannot  be  analyzed.  It  is  the  author’s  experience  that 
those  within  the  field  of  polygraph  scoring  do  not  consider  “no  opinion”  as  an  error  and  that  in  many  cases 
with  subsequent  testing  (sometimes  called  a  “reexamination”)  a  definitive  opinion  can  be  rendered. 
However,  it  is  duly  noted  that  some  outside  the  profession  consider  “no  opinion”  to  be  an  error,  and  in 
research  this  dissenting  opinion  is  sometimes  taken  into  account. 

24  Mr.  Raymond  Nelson  assisted  in  the  research  question  designs,  as  well  as  the  computation  of  the 
tables  and  figures  and  the  interpretation  of  the  results  and  analysis. 


22 


making  in  that  scientific  test  results  have  been  shown  to  be  significantly  greater  than 
chance,  even  if  imperfect.  This  assumption  is  based  on  several  predicate  assumptions: 
that  those  administering  and  analyzing  the  tests  have  acquired  advanced  training  and 
education;  that  these  practitioners  are  qualified  in  their  respective  fields  to  select, 
administer,  and  interpret  tests  that  will  provide  information  that  will  assist  the  referring 
professionals  to  make  better  decisions. 

Although  signal  detection  theory25  is  not  an  integral  part  of  this  thesis,  it  is 
important  to  understand  that  the  diagnostic  analysis  of  polygraph  tracings  involves  signal 
detection,  particularly  as  an  underpinning  in  the  scientific  work  necessary  to  advance  the 
field.  Signal  detection  involves  the  diagnostician’s  being  able  to  distinguish  between 
signals  and  noise.  McNicol  called  it  “a  theory  about  the  way  in  which  choices  are  made” 
(McNicol,  2005).  Signal  information  is  diagnostic  information  that  the  observer  wants  to 
see,  and  noise  is  any  nonsignal  information  or  background  noise  (Keating,  2005)  that  can 
make  the  identification  of  diagnostic  information  difficult.  Clearly,  extracting  diagnostic 
information  from  the  “raw”  data  of  polygraph  tracings  involves  the  diagnostician — in  this 
case  a  polygraphist  or  blind  reviewer — making  observations  about  two  states  and 
assigning  an  assessment  of  which  state  he  observes.  Test  sensitivity  {Polygraph  and  lie 
detection,  2003)  involves  the  effectiveness  with  which  signal  information  can  be 
extracted  and  used  to  identify  the  issue  of  concern.  Test  specificity  also  involves  the 
effectiveness  with  which  the  absence  of  signal  information  is  determined  and  affects  the 
ability  of  a  test  to  determine  when  the  issue  of  concern  is  not  present.  Harvey  further 
describes  this  phenomenon  in  “Detection  Sensitivity  and  Response  Bias.”  He  explains 
that  the  “detection  performance”  (diagnostics)  is  based  on  both  a  sensory  process  and  a 
decision  process.  A  simple  yes  or  no  can  be  the  response  as  to  whether  or  not  a  signal 
was  present,  or  there  can  be  a  “rating  of  the  confidence”  that  a  signal  was  present.  In  the 
case  of  most  polygraph  TDA  systems,  the  response  is  a  yes  or  no,  with  the  value  of  yes 
described  in  a  positive  or  negative  number.  This  involves  a  sensory  process  (sensitivity). 

As  one  might  deduce,  signal  detection  theory  had  its  early  beginnings  with  those  researching  radar. 
Its  psychological  roots  began  in  the  1950s  and  were  primarily  led  by  John  A.  Swets  (Herbert,  2010).  See 
Herbert  for  an  insightful  article  about  Mr.  Swets  and  signal  detection  theory. 
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as  well  as  a  decision  process  with  a  defined  criteria  parameter  (in  this  case,  the 
instructions  contained  on  the  hand-score  sheet)  (Harvey,  2003). 

In  signal  detection  theory,  this  sensory  and  response  criteria  process  involves 
“hits”  and  “misses.”  That  is,  there  is  a  hit  when  the  diagnostician  says  yes  to  the  signal 
that  is  present  (hit  rate),  and  a  miss  (false-alarm  rate)  occurs  when  the  diagnostician  says 
yes  to  a  signal  that  is  not  present,  meaning  that  noise  was  wrongly  identified  as  a  signal. 
Table  2  graphically  displays  this  theory. 


Table  2.  Conditional  Probabilities,  Signal  Detection  Theory 


“Yes” 

“No” 

Signal  Present 

Hit  Rate  (HR) 

Miss  Rate  (MR) 

Signal  Absent 

False  Alarm  Rate  (FAR) 

Correct  Rejection  Rate 
(CRR) 

In  polygraph,  the  FAR  and  MR  are  respectively  known  as  false  positive^^  and 
false  negative.27 

2,  Analysis  Method 

Three-position  and  seven-position  TDA  numerical  scores  were  transformed  to 
ESS  scores  and  subjected  to  a  2  x  3  ANOVA  (criterion  state  x  TDA  system)  for  absolute 
magnitude  of  mean  numerical  scores.  Transformation  to  a  common  numerical  scale 
ensures  that  differences  are  not  attributable  to  scale  differences  and  are  a  reflection  of 
differences  in  the  effectiveness  with  which  examiners  extract  diagnostic  (signal) 
information  using  the  three  TDA  systems. 


The  false  deteetion  of  something  that  is  not  aetually  present.  In  polygraph  it  is  the  ineorreet  deeision 
that  deeeption  was  praetieed  by  the  examinee  (Krapohl,&  Sturm,  2002b). 

The  failure  to  deteet  the  presenee  of  a  partieular  event  or  item.  A  false  negative  in  polygraph  refers 
to  the  ineorreet  deeision  that  deeeption  was  not  praetieed  by  the  examinee  (Krapohl  &  Sturm,  2002b). 
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Figure  1 .  Mean  and  Standard  Deviations  for  Numerieal  Seores 
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Table  3.  2x3  ANOVA  Summary  (Criterion  State  x  TDA  model)  for  Mean  Seores 


Souroe 

SS" 

d^ 

MS" 

E^ 

E  orit  .05^ 

TDA  System 

96.212 

2 

1.093 

0.032 

.969 

3.031 

Criterion  state 

284.379 

1 

2.154 

0.063 

.803 

3.878 

Interaotion 

44.576 

1 

44.576 

1.294 

.256 

3.878 

Error 

8890.591 

258 

34.460 

Total 

425.167 

262 

“SS  Sum  of  Squares 

Mf  Degrees  of  Freedom 

‘^MS  Mean  Square 

‘*F  The  F  Value 

Probability  Value 

^F  Critieal  Value  of  F  with  a=  .05 

The  ANOVA  analysis  produeed  no  signifioant  differenees — an  indieation  that 
eaeh  of  the  TDA  systems  is  eapable  of  extraeting  similar  signal  (diagnostie)  information 
from  the  raw  data.  That  is,  using  any  one  of  the  three  TDA  systems,  the  polygraphist 
should  be  able  to  observe  the  eriterion  for  truthfulness  or  deeeption,  with  no  one  system 
being  more  or  less  diagnostie. 

B,  RESEARCH  QUESTION  #2 

1.  Results 

Are  there  signifieant  differenees  in  eriterion  aeouraey  for  the  three-position, 
seven-position,  and  ESS  TDA  systems? 

Criterion  aeouraey  (validity)  refers  to  how  effeotively  the  testing  system  plaoes 
individual  oases  in  the  oorreot  eriterion  oategory.  In  polygraph,  the  signals  intended  to  be 
oaptured  are  the  test  results  of  deeeption  indioated  or  no  deeeption  indioated.  In  the  ease 
of  a  single  issue  examination,  suoh  as  a  oriminal  investigation  or  event-speoifio  inoident. 
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this  measure  (criterion)  is  the  polygraphist’s  opinion  about  the  examinee’s  deception  or 
truthfulness  corresponding  to  actual  truthfulness  (ground  truth). 

Within  signal  detection  theory,  one  measure  of  stimulus  is  sensitivity,  discussed 
below.  Another  measure  within  signal  detection  is  response  bias.  This  thesis  does  not 
research  response  bias,  and  it  is  left  for  future  research;  however,  it  is  important  to 
understand  that  the  phenomenon  exists.  Response  bias  is  the  tendency  of  the 
diagnostician  to  choose  one  response  over  another.  In  other  words,  the  tendency  of  a 
diagnostician  to  favor,  that  is,  to  be  biased  toward,  the  selection  of  one  response  over 
another.  The  more  features  available,  the  more  opportunities  for  a  diagnostician  to 
become  biased.  Detection  theory  allows  for  determining  or  delimiting  the  distributions 
consistent  with  bias  or  sensitivity  and  specificity  of  a  test  measure.  Sensitivity  and  bias 
taken  together  all  lead  to  a  decision  system  in  which  the  stimulus  classes  reach  equal- 
variance  normal  distributions  for  the  decision  variable,  making  them  more  meaningful. 
This  decision  system  can  then  be  tested  using  receiver  operating  characteristic  curves, 
which  then  leads  us  graphically  to  the  proportion  of  hits  (signal)  to  the  proportion  of  false 
alarms  (noise).  This  becomes  important  in  determining  how  to  manipulate  response 
bias — either  through  instruction  or  by  use  of  a  confidence  rating  (p  value)  (Macmillan,  & 
Creelman,  1996).  More  specifically,  as  response  bias  relates  to  polygraph  scoring,  the 
development  of  the  ESS-TDA  method  is  designed  to  reduce  the  response  bias  of 
polygraphists.  Specifically,  older  TDA  methods  relied  on  more  features  and  criteria  to 
arrive  at  a  final  score.  These  attributes  make  the  scoring  methods  difficult  to  learn 
(instruction)  and  more  subjective  (introducing  response  bias),  with  less  interrater 
reliability  (Blalock,  Cushman,  &  Nelson,  2009).  ESS  utilizes  the  “bigger-is-better”  rule, 
which  means  fewer  features  to  score  allows  for  ease  in  learning.  Also,  the  ESS  is  the  only 
hand-scoring  method  that  has  a  p-value  table.  The  use  of  the  p-value  addresses  the  second 
method  of  dealing  with  response  bias — the  use  of  confidence  rating.  Again,  response  bias 
is  a  topic  for  future  research,  but  it  is  mentioned  here  to  demonstrate  that  ESS  addresses  it 
and  that  the  p-values  associated  with  ESS  allow  for  criterion  selection  that  addresses 
levels  of  sensitivity. 
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Sensitivity  is  but  one  aspeet  of  accuracy  (validity).  If  deception  is  perfectly 
indicated  whenever  a  lie  is  present,  then  the  signal  proves  positive  (deceptive)  whenever 
a  lie  is  present;  the  measure  is  positive  for  deceptive  in  all  positive  cases  and  no  false 
negatives  are  produced;  in  other  words,  perfect  sensitivity  {Polygraph  and  lie  detection, 
2003). 

Specificity  is  the  other  aspect  of  accuracy.  If  deception  is  absent,  then  the  signal 
always  shows  negative  and  is  therefore  perfectly  specific  to  deception;  it  produces  no 
false  positives.  A  test  is  more  specific  the  greater  the  proportion  of  persons  who  appear 
nondeceptive  on  the  test;  in  other  words,  perfect  specificity  {Polygraph  and  lie  detection, 
2003). 


2.  Analysis  Method 

The  analysis  method  used  was  multivariate  ANOVAs  (criterion  state  x  TDA 
system)  for  decisions  with  inconclusives  (i.e.,  test  sensitivity  to  deception  and  test 
specificity  to  truthfulness),  inconclusive  rates,  and  error  rates. 


Table  4.  Means,  (Standard  Deviations),  and  {95%  Confidence  Intervals}  for 

Criterion  Accuracy 


3-position 

7-position 

ESS 

Sensitivity 

.886  (.087) 

{.716  to  >.999} 

.841  (.136) 

{.574  to  >.999} 

.886  (.045) 

{.797  to  .975} 

Specificity 

.591  (.091) 

{.413  to  .769} 

.614  (.087) 

{.443  to  .784} 

.727  (.129) 

{.475  to  .979} 

Inc  D 

.114  (.087) 
{<.001  to  .284} 

.159  (.136) 
{<.001  to  .426} 

.114  (.045) 

{.025  to  .203} 

IncT 

.341  (.155) 

{.037  to  .645} 

.341  (.114) 
{.into  .565} 

.182  (.148) 
{<.001  to  .473} 

FN  Errors 

<.001  (<.001) 
{<.001  to  <.001} 

<.001  (<.001) 
{<.001  to  <.001} 

<.001  (<.001) 
{<.001  to  <.001} 

FP  Errors 

.068  (.087) 
{<.001  to  .239} 

.045  (.052) 
{<.001  to  .148} 

.091  (.074) 
{<.001  to  .236} 
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Mean  Plot  for  Decisions,  Errors  and  Inconclusive  Results 


Figure  2.  Mean  Plot  for  Deeisions,  Errors,  and  Inconclusive  Results 


Table  5.  Three-way  (2x3x3)  ANOVA  Contrast  for  Test  Accuracy  (Criterion 

State  X  TDA  System  x  Accuracy  Dimension) 


Source 

SS 

df 

MS 

E 

P 

E  crit 
.05 

Criterion  dimension 

6.814 

2 

3.407 

392.260 

.000 

3.168 

Status 

0.000 

1 

0.000 

0.018 

.893 

4.020 

TDA  system 

0.000 

2 

0.000 

0.013 

.987 

3.168 

Criterion  dimension  x  status 

0.504 

2 

0.252 

28.985 

.000 

3.168 

Status  X  TDA  system 

0.000 

2 

0.000 

0.013 

.987 

3.168 

Criterion  dimension  x  TDA  system 

0.082 

4 

0.020 

2.352 

.065 

2.543 

Criterion  dimension  x  status  x  TDA 
system 

0.054 

4 

0.014 

1.559 

.198 

2.543 

Error 

0.469 

54 

0.009 

Total 

7.923 

71 

The  value  of  this  three-way  contrast  is  that  it  encompasses  the  entire  experimental 


question;  it  provides  greater  degrees  of  freedom;  and  it  provides  more  power  than  a  series 
of  two-way  analyses. 
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There  was  no  significance  in  this  three-way  interaction,  which  suggests  no 
statistically  significant  differences  in  the  accuracy  of  the  three  compared  TDA  systems.  It 
is  noted  that  the  two-way  interaction  was  significant  for  criterion  dimension  (x  case 
status).  This  suggests  that  the  different  TDA  systems  may  perform  differently  with 
criterion  truthful  and  criterion  deceptive  cases. 

In  this  instance,  differences  in  criterion  dimension  are  expected,  in  that  it  is  hoped 
that  error  and  inconclusive  rates  are  lower  than  decision  accuracy  rates.  This  main  effect 
did  not  undergo  additional  analysis.  The  most  significant  interaction  in  the  three-way 
analysis  was  the  two-way  interaction  of  criterion  dimension  x  case  status.  Again,  this 
interaction  supports  the  expectation  that  correct,  inconclusive,  and  erroneous  will  not 
result  in  similar  proportions.  Because  of  this,  two-way  post-hoc  ANOVAs  were 
completed  for  each  of  the  three  dimensions  of  test  accuracy:  decisions,  errors,  and 
inconclusive  results. 


Mean  Plot  for  Sensitivity  and  Specificity 


—  Crit.  Deceptive 


Crit.  Truthful 


Figure  3.  Mean  Plot  for  Sensitivity  and  Specificity 
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Table  6.  Two-way  ANOVA  Summary  (Case  Status  x  TDA  System)  for  Decision 

Accuracy,  Including  Inconclusive  Results  (i.e.,  Sensitivity  and  Specificity) 


Source 

SS 

df 

MS 

F 

P 

F  crit  .05 

TDA  System 

0.030 

2 

0.004 

0.366 

.698 

3.555 

Criterion  state 

0.310 

1 

0.026 

2.557 

.127 

4.414 

Interaction 

0.019 

1 

0.019 

1.841 

.192 

4.414 

Error 

0.182 

18 

0.010 

Total 

0.358 

22 

Neither  the  two-way  interaction  nor  the  main  effects  for  case  status  or  TDA 
system  were  significant  for  sensitivity  and  specificity. 


Mean  Plot  for  Inconclusive  Results 


1.0 

0.9 


“  ^  —  Crit.  Deceptive 


Crit.  Truthful 


Figure  4.  Mean  Plot  for  Inconclusive  Results 
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Table  7. 


Two-way  ANOVA  Summary  (Case  Status  x  TDA  System)  for 
Inconelusive  Results 


Source 

SS 

df 

MS 

E 

P 

E  crit  .05 

TDA  System 

0.050 

2 

0.006 

0.472 

.631 

3.555 

Criterion  state 

0.167 

1 

0.014 

1.043 

.321 

4.414 

Interaction 

0.034 

1 

0.034 

2.534 

.129 

4.414 

Error 

0.240 

18 

0.013 

Total 

0.251 

22 

Neither  the  two-way  interaetion  nor  the  main  effects  for  case  status  or  TDA 
system  were  significant  for  inclusive  rates. 


Figure  5.  Mean  Plot  for  Errors 
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Table  8. 


Two-way  ANOVA  Summary  (Case  Status  x  TDA  System)  for 
Inconclusive  Results 


Source 

SS 

df 

MS 

E 

P 

E  crit  .05 

TDA  System 

0.002 

2 

0.000 

0.098 

.907 

3.555 

Criterion  state 

0.027 

1 

0.002 

0.855 

.367 

4.414 

Interaction 

0.002 

1 

0.002 

0.782 

.388 

4.414 

Error 

0.048 

18 

0.003 

Total 

0.031 

22 

Neither  the  interaction  nor  the  main  effects  of  case  status  and  TDA  system  were 
significant  for  errors. 

The  results  of  these  analyses  indicate  that  the  three-position,  seven-position,  and 
ESS  TDA  systems  produce  different  rates  of  correct,  erroneous,  and  inconclusive  results. 
However,  there  was  no  significance  in  the  differences  in  the  three  TDA  systems.  It  is 
noted  that  this  may  be  a  result  of  sample  size  and  the  size  of  the  cohorts.  Larger  sample 
sizes  and  larger  cohorts  may  produce  significant  differences. 

No  statistical  power  analysis  was  completed.  Confidence  intervals  can  be  found  in 
the  table  of  means  (Table  4). 

It  is  noted  that  there  is  an  absence  of  false-negative  errors  in  this  study.  In  a  2006 
study,  Krapohl  reported  a  field  study  with  a  false-negative  rate  at  2.7%  (Krapohl,  2006). 
The  current  error  rate  should  be  taken  as  statistically  meaningless.  It  is  unrealistic  to 
expect  this  in  field  settings  or  larger  studies.  The  result  should  be  used  with  caution. 

C.  RESEARCH  QUESTION  #3 
1.  Issue  Posed 

What  is  the  effect  on  the  accuracy  of  transforming  three-position  and  seven- 
position  scores  to  ESS  scores? 
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2,  Analysis  Method 

ESS  scoring  rules  were  applied  to  three-position  and  seven-position  TDA 
systems,  and  a  two-way  ANOVA  (TDA  system  x  ESS  transformation)  was  calculated. 


Eigure  6.  Three-Position  Score  Transformed  to  ESS  Scores 


Seven-Position  Score  Transformed  to  ESS  Scores 


Unweighted  Accuracy 


Unweighted  Inconclusives 


Eigure  7.  Seven-Positions  Seore  Transformed  to  ESS  Scores 
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Table  9. 


Unweighted  Accuraey 


Raw 

ESS  transformed 

3 -position 

.957  (.054) 

{.851  to  >.999} 

.948  (.041) 
{0.867  to  >.999} 

7-position 

.968  (.037) 

{.895  to  >.999} 

0.988  (0.025) 
{.939  to  >.999} 

Table  10.  Unweighted  Inconclusives 


Raw 

ESS  transformed 

3-position 

.227  (0.052) 

{.124  to  0.33} 

.091  (.037) 

{.018  to. 164} 

7-position 

.250  (.114) 

{.026  to  .474} 

.148  (.068) 

{.014  to  .281} 

Table  1 1 .  Two-way  ANOVA  Summary  (TDA  System  x  ESS  Transformation)  for 

Accuracy 


Source 

SS 

df 

MS 

E 

P 

E  crit  .05 

Transformation 

0.000 

1 

0.000 

0.009 

.928 

4.747 

TDA  System 

0.003 

1 

0.000 

0.197 

.665 

4.747 

Interaction 

0.001 

1 

0.001 

0.467 

.507 

4.747 

Error 

0.020 

12 

0.002 

Total 

0.004 

15 

No  significant  differences  were  found  between  the  distributions  of  ESS  scores  and 
the  transformed  three-position  and  seven-position  scores  when  a  two-way  ANOVA  was 
conducted.  Also,  there  were  no  significant  differences  in  unweighted  accuracy  when 
transforming  the  scores  of  these  TDA  models  to  ESS  scores. 
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Table  12.  Two-way  ANOVA  Summary  (TDA  System  x  ESS  Transformation)  for 

Ineonelusive  Results 


Souree 

SS 

df 

MS 

E 

P 

E  erit  .05 

Transformation 

0.057 

1 

0.007 

1.302 

.276 

4.747 

TDA  System 

0.006 

1 

0.001 

0.145 

.710 

4.747 

Interaetion 

0.001 

1 

0.001 

0.213 

.653 

4.747 

Error 

0.066 

12 

0.005 

Total 

0.064 

15 

There  are  also  no  signifioant  differenees  between  the  three-position  and  seven- 
position  TDA  inelusive  results  when  a  two-way  ANOVA  was  eondueted  for  ineonelusive 
results.  A  larger  study  may  produee  statistieal  power  that  eould  provide  for  expeeted 
improvement. 

D,  RESEARCH  QUESTION  #4 

How  aeeurate  are  the  eombined  3-position,  7-position,  and  ESS  TDA  results? 
How  aeeurate  are  the  eombined  results  when  all  seores  are  transformed  to  ESS  seores?  Is 
the  differenee  signifieant? 


Table  13.  Aeeuraey  of  ESS,  Three-position  and  Seven-position  Seores  Combined 


Raw  seores 

All  seores  transformed  to 
ESS  Seores 

Unweighted  Aeeuraey 

.957  (.043) 

{.874  to  >.999} 

.961  (.040) 

{.883  to  >.999} 

Unweighted  Ineonelusives 

.208  (.090) 

{.032  to  .384} 

.129  (.064) 

{.004  to  .254} 
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Figure  8.  Raw  Scores  and  Transformed  ESS  Scores 


Table  14.  Two-way  ANOVA  Contrast  (transformation  x  accuracy  dimension)  for 

Test  Accuracy. 


Source 

SS 

df 

MS 

E 

P 

E  crit  .05 

Transformatio 

n 

0.017 

1 

0.001 

0.186 

.669 

4.062 

Dimension 

7.498 

1 

0.312 

80.381 

.000 

4.062 

Interaction 

0.021 

1 

0.021 

5.330 

.026 

4.062 

Error 

0.171 

44 

0.004 

Total 

7.537 

47 

Because  it  is  known  that  the  proportion  of  inconclusive  differs  from  the 
proportion  of  correct,  there  is  an  expected  significant  main  effect  for  accuracy  dimension. 
ESS-transformed  scores  will  produce  different  types  of  changes  in  decisions  and 
inconclusive,  as  significant  interaction  for  transformation  and  accuracy  dimension 
suggests;  decision  accuracy  increases  and  inconclusive  results  decrease. 

One-way  differences  for  decision  accuracy  were  not  significant  [E(l,22)  =  0.004, 
(p  =  0.952)]. 
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A  larger  sample  may  have  found  a  significant  difference  in  these  results:  one-way 
differences  for  inconclusive  results  were  also  not  significant  [F(l,22)  =  0.522,  (p  = 
0.478)]. 

E.  RESEARCH  QUESTION  #  5 

Are  there  differences  in  accuracy  that  can  be  attributed  to  experience?  Does  more 
experience  result  in  increased  accuracy? 

The  average  years  of  experience  are  15.  The  standard  deviation  is  three.  The 
maximum  years  of  experience  are  33.  The  minimum  years  of  experience  are  three.  The 
median  years  of  experience  are  14,  and  the  mode  is  14.  None  of  the  participants  is 
considered  inexperienced. 

For  the  purpose  of  this  research,  fewer  than  ten  years  is  considered  low 
experience  and  more  than  ten  years  is  considered  high  experience. 

Table  15.  Accuracy  and  Inconclusive  Rates  for  Low-Experience  and  High- 

Experience  Participants 


Eow  Experience 

High  Experience 

Unweighted 

Accuracy 

.958  (.043) 

{.873  to  1.042} 

.963  (.041) 

{.883  to  1.044} 

Unweighted 

Inconclusive 

s 

.118  (.069) 

{<.001  to  .253} 

.136  (.064) 

{.010  to  .262} 
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Accuracy  and  inconclusive  rates  for  ESS  scores  of  low- 
experience  and  high-experience  participants 


Low  Experience  High  Experience 


Unweighted  Accuracy 


Unweighted  lnconclusi\«s 


Figure  9.  Accuracy  and  Inconclusive  Rates  for  ESS  Scores  of  Low-Experience  and 

High-Experience  Participants 


Due  to  differences  in  sample  size  and  an  expected  difference  in  decision  and 
inconclusive  rates,  unbalanced  one-way  ANOVA  were  used. 

Results  between  high-  and  low-experience  participants  were  not  significant  for 
decision  accuracy  [F(l,10)  =  0.009,  (p  =  0.925)].  Neither  were  results  significant  for 
inconclusive  results  [F(l,10)  =  0.037,  (p  =  0.851)]. 

There  was  no  effect  for  low  or  high  experience  in  this  sample  data.  That  is,  the 
low-experience  participants  scored  polygraph  charts  using  ESS  with  the  same  accuracy 
and  inconclusive  rates  as  high-experience  participants.  This  outcome  is  consistent  with 
that  reported  between  inexperienced  scorers  and  experienced  scorers  by  Blalock, 
Cushman,  and  Nelson  (2009)  and  Krapohl  and  Cushman  (2006). 

F.  RESEARCH  QUESTION  #6 

Does  accuracy  vary  with  the  examiner’s  type  of  employment?  Are  there 
differences  in  accuracy  between  private  examiners  and  those  who  work  for  law 
enforcement  or  government  agencies? 
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One  federal  examiner  was  eombined  with  the  eounty/local  law  enforeement  group 
for  a  eombined  group  of  government  employees. 


Table  16.  Accuracy  and  Inconclusive  Rates  for  ESS  Scores  of  Private-Practice  and 

Law  Enforcement/ Government  Participants 


Private 

LE/Gvt 

Unweighted 

Accuracy 

.977  (.026) 

{.926  to  1.029} 

.953  (.045) 

{.865  to  1.04} 

Unweighted 

Inconclusives 

.114  (.079) 

{<.001  to  .268} 

.136  (.060) 

{.020  to  .253} 

Accuracy  and  inconclusive  rates  for  ESS  scores  Private 
Practice  and  Law-enforcement/Gvt  Participants 


—  Unweighted  Accuracy 


Unweighted  Inmnciusives 


Eigure  10.  Accuracy  and  Inconclusive  Rates  for  ESS  Scores  of  Private-Practice  and 

Law  Enforcement/Government  Participants 

Due  to  differences  in  sample  size  and  an  expected  difference  in  decision  and 
inconclusive  rates,  unbalanced  one-way  ANOVA  were  used. 

Results  were  not  significant  for  decision  accuracy  [P(l,9)  =  0.228,  (p  =  0.644)], 
nor  were  results  significant  for  inconclusive  results  [P(l,9)  =  0.053,  (p  =  0.823)]. 

There  was  no  effect  for  type  of  employment  in  this  sample  data,  although  a  larger 
sample  size  may  be  expected  to  produce  different  results. 
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VII.  DISCUSSION 


A.  INTRODUCTION 

Polygraph  has  been  used  as  a  tool  by  the  federal  government,  the  military,  and 
state  and  loeal  governments  for  several  deeades.  It  has  been  and  eontinues  to  be  used  as  a 
sueeessful  instrument  in  national-seeurity  issues,  homeland  security,  and  the  war  on 
terror.  Nevertheless,  its  detractors  and  those  unfamiliar  with  its  utility  and  successes,  as 
well  as  disagreements  and  lack  of  foresight  within  the  profession  itself,  have  caused  some 
of  the  agencies  and  decision  makers  who  do  or  could  benefit  from  its  use  to  be  reluctant 
to  rely  on  it.  Some  of  this  reluctance  and  even  abandonment,  in  spite  of  the  lack  of 
alternatives,  is  due  to  outside  pressure.  The  pressure  can  come  in  the  form  of  political 
pressure,  from  uninformed  law  or  policy,  or  from  those  who  believe  they  have  been 
wronged  or  harmed  through  the  use  of  the  polygraph.  The  challenge  then  has  become 
multifold;  Polygraph  proponents  must  ensure  that  there  is  ongoing  research  to  address  the 
concerns  and,  in  some  instances,  the  valid  arguments  and  criticisms  of  detractors  (some 
of  whom  work  in  other  scientific  disciplines),  and  they  must  continue  to  move  the 
profession  onto  a  sound  scientific  foundation.  Within  the  profession,  infighting  and  lack 
of  foresight  and  vision  must  be  overcome  for  the  sake  of  ensuring  that  this  tool  remains 
viable  and  valuable  in  its  contribution  to  the  homeland’s  safety  and  security,  regardless  of 
the  form  that  proven  instrumentation  and  technology  takes.  It  is  imperative  that  decision 
makers,  policy  makers,  and  other  consumers  who  currently  rely  on  the  polygraph  (as  well 
as  those  who  should)  be  educated  by  those  within  the  profession  who  can  and  should 
undertake  such  a  goal.28 


Polygraph  has  a  long  history  of  infighting  within  the  profession,  as  the  author  can  attest  to  from  his 
25-plus  years  in  the  field.  This  infighting  tends  to  revolve  around  scientific  research  and  its  importance  to 
the  trade.  There  are  those  within  the  field  who  believe  that  the  profession  need  not  be  concerned  about  what 
detractors  say  about  the  validity  and  reliability  of  the  polygraph.  This  side  of  the  house  tends  to  argue  that 
“we  know  it  works”  and  its  utility  is  incontrovertible.  The  other  side  of  the  house  argues  that,  for  the  field 
to  survive,  the  polygraph  must  continue  to  build  a  strong  scientific  foundation.  That  is,  in  order  to  continue 
to  serve  its  important  role  in  national  security  and  law  enforcement,  it  must  prove  its  validity  and  reliability 
so  that  its  worth  can  be  proven  to  policy  makers  and  legislators  in  contrast  to  the  naysayers’  claims.  This 
thesis  falls  on  this  side  of  the  argument. 
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Various  studies  and  reviews  have  been  undertaken  in  regard  to  hand-seored 
polygraph  techniques.  Two  primary  hand-scoring  techniques  in  use  today  are  the  three- 
position  and  seven-position  TDA  systems.  These  two  systems  employ  twelve  scoring 
features  for  the  purpose  of  assigning  positive  values  (no  deception  indicated)  or  negative 
values  (deception  indicated)  when  the  responses  to  relative  questions  are  compared  to  the 
comparison  questions.  The  rules  (instructions)  for  assigning  values  are  complex.  The 
Empirical  Scoring  System  uses  observation  of  three  scoring  features  for  the  purpose  of 
assigning  these  negative  and  positive  values.  The  instructions  for  assigning  these  values 
are  simple  and  rely  on  the  bigger-is-better  rule. 

The  purpose  of  this  study  was  to  extend  the  research  into  the  Empirical  Scoring 
System  to  see  whether  it  has  additional  value  or  is  at  least  the  equivalent  of  other  hand¬ 
scoring  techniques  currently  in  use.  Various  research  questions  were  posed,  and  through 
the  use  and  analysis  of  raw  data,  comparisons  were  made  between  the  three-position  and 
seven-position  scoring  techniques,  arguably  two  of  the  most  highly  utilized  scoring 
techniques  in  the  polygraph  profession.  These  two  techniques  have  been  in  use  since  the 
1960s  and  are  taught  at  the  National  Center  for  Credibility  Assessment  (NCCA),29  as 
well  as  other  private  and  government-funded  polygraph  schools  across  the  United  States 
and  internationally. 30  Previous  research  on  hand-scoring  techniques  was  normative- 
based,  while  research  on  the  ESS  is  empirically  based  and  has  allowed  for  the  assignment 
of  p-values  to  the  technique.  The  intent  of  this  study  was  to  conduct  additional  research 
of  the  ESS,  to  further  determine  whether  its  design  and  method  of  use  offer  advantages 
over  the  compared  techniques. 

B,  DISCUSSION 

The  research  design,  primarily  through  use  of  ANOVA,  was  intended  to  measure 
several  facets  of  the  ESS  as  compared  to  the  three-position  and  seven-position  TDA 

29  NCCA  is  the  Department  of  Defense’s  polygraph  school,  which  all  federal  polygraphists  attend  as 
part  of  their  initial  training.  It  was  formerly  known  as  the  Department  of  Defense  Polygraph  Institute 
(DODPI)  and  later  as  the  Defense  Academy  for  Credibility  Assessment  (DACA). 

39  Currently,  the  American  Polygraph  Association,  the  largest  professional  polygraph  association  in 
the  world,  accredits  16  schools  in  the  United  States  and  13  international  schools. 
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systems.  The  study  used  22  arehival  matehed  random  samples  of  You-Phase 
examinations  from  the  confirmed  case  archive  at  the  Department  of  Defense.  Eleven  of 
these  cases  were  confirmed  truthful  examinations.  “Confirmed  truthful”  in  the  instance  of 
these  1 1  examinations  means  that  an  alternative  person  was  identified  as  a  suspect  or  the 
examinee  was  exonerated,  as  there  was  evidence  or  a  confession  outside  the  opinion 
rendered  by  the  specific  polygraphist.  Eleven  matching  confirmed  deceptive 
examinations  were  also  provided.  “Confirmed  deceptive”  in  the  instance  of  these  11 
examinations,  means  that  there  was  evidence  or  a  confession  outside  the  opinion  rendered 
by  the  specific  polygraphist.  As  per  the  You-Phase  protocol,  which  is  part  of  the 
examination  technique,  these  are  single-issue  examinations  that  contain  two  relevant 
questions  and  three  comparison  questions,  as  well  as  other  procedural  questions.  The 
study  participants  were  randomly  selected  and  consisted  of  three  groups  (Cohorts  1,  2, 
and  3)  of  four  scorers  each.  The  first  cohort  utilized  the  ESS-TDA.  The  second  cohort 
used  the  three-position  TDA  system.  The  third  cohort  used  the  seven-position  TDA 
system.  There  were  six  research  questions  in  the  study. 

The  first  question  was  to  discover  whether  there  were  differences  in  the  ability  of 
each  TDA  system  to  extract  diagnostic  data  from  the  provided  examinations.  This 
analysis  was  undertaken  through  use  of  ANOVA.  The  ANOVA  analysis  produced  no 
significant  differences  between  the  three  TDA  systems;  each  was  as  capable  as  the  others 
of  extracting  diagnostic  information. 

The  next  research  question  was  whether  there  were  differences  in  the  three  TDA 
systems  for  criterion  accuracy  (validity).  This  analysis  was  conducted  through  the  use  of 
multivariate  ANOVAs  and  targeted  inconclusives,  inconclusive  rates,  and  error  rates.  No 
significant  differences  were  found  in  the  three-way  interaction.  However,  in  the  two-way 
interaction  it  was  significant  for  criterion  dimension.  This  suggests  that  the  TDA  systems 
may  perform  differently  with  criterion  truthful  and  criterion  deceptive  cases.  This 
interaction  supports  the  expectation  that  correct,  inconclusive,  and  error  rates  will  not 
result  in  similar  proportions.  That  is,  the  hit  rate  should  be  better  than  the  miss  rate  and 
the  indeterminate  rate.  The  two-way  interaction  showed  no  significance  for  sensitivity  or 
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specificity,  and  this  supports  the  expectation.  Although  the  systems  produced  different 
rates  of  correct,  errors,  and  inconclusive  results,  there  are  no  significant  differences  in  the 
three  TDA  systems.  ESS  seems  to  have  a  better  specificity,  lower  inconclusive,  and  an 
equivalent  error  rate  (which  approached  zero  for  all  three  TDA  systems;  again,  an 
unrealistic  result  that  should  only  be  used  with  caution).  It  is  noted  that  both  the  sample 
size  and  the  size  of  the  cohorts  may  have  had  an  effect  on  this  lack  of  significance.  A 
larger  sample  size  and  larger  cohorts  may  produce  significant  differences.  What  can  be 
said  as  a  result  of  this  research  is  that  ESS  appears  to  have  at  least  the  same  criterion 
accuracy  as  the  three-position  and  seven-position  TDA  systems. 

Transforming  the  three-position  and  seven-position  TDA  scores  to  ESS  scores 
was  conducted  to  determine  whether  there  was  an  effect  on  the  accuracy  of  the  three-  and 
seven-position  scoring  systems.  This  was  accomplished  through  application  of  a  two-way 
ANOVA.  No  significant  differences  were  found  between  the  distributions  of  the  three 
TDA  systems  when  the  two-way  ANOVA  was  conducted,  which  means  that  there  is  a 
high  correlation  between  the  three  when  transformed.  There  was  no  significant  difference 
in  unweighted  accuracy  as  the  result  of  transformation.  In  terms  of  inconclusive  rates  for 
the  transformed  three-position  and  seven-position  scores  transformed,  no  significant 
differences  were  found.  This  was  an  unexpected  result.  The  expectations  were  that  the 
inconclusive  rates  would  be  higher  for  both  the  three-position  and  seven-position  TDA 
systems  transformed  to  ESS.  This  unexpected  result  is  likely  the  result  of  a  small  sample 
size.  A  larger  study  should  produce  statistical  power  that  may  provide  the  expected 
results. 

A  fourth  research  question  looked  at  the  accuracy  of  the  result  of  the  combination 
of  the  raw  scores  of  all  three  TDA  systems  to  ESS.  There  was  a  significant  main  effect 
for  the  accuracy  dimension,  and  this  was  expected,  given  that  it  is  known  that  the 
proportion  of  inconclusive  will  differ  from  the  proportion  of  correct.  Decision  accuracy 
increased  and  inconclusive  results  decreased.  One-way  differences  in  decision  accuracy 
were  not  significant;  however,  neither  were  one-way  differences  for  inconclusive  results. 
It  is  hypothesized  that  a  larger  sample  may  find  significant  differences. 
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The  fifth  question  under  study  was  to  discover  whether  there  were  differences  in 
accuracy  based  on  level  of  experience.  A  one-way  unbalanced  ANOVA  was  utilized  for 
analysis,  and  there  were  no  significant  differences  for  decision  accuracy  or  inconclusive 
results  based  upon  experience.  Although  there  were  no  inexperienced  participants  in  any 
of  the  cohorts,  there  were  participants  with  low  experience  and  participants  with  high 
experience.  The  results  seem  to  show  that  there  is  no  effect  in  the  application  of  ESS 
scoring  techniques  based  on  years  of  experience  in  the  field  of  polygraph  scoring. 

A  final  analysis  was  conducted  to  determine  whether  type  of  employment,  private 
or  government,  had  any  effect  on  accuracy.  An  unbalanced  one-way  ANOVA  found  that 
there  were  no  significant  differences  for  decision  accuracy  or  for  inconclusive  results. 
There  was  no  effect  for  type  of  employment  based  on  this  research.  However,  it  is  again 
hypothesized  that  a  larger  sample  size  might  produce  different  results.  The  analysis 
seems  to  indicate  that  type  of  employment  has  no  effect  on  TDA  diagnostics. 

C.  LIMITATIONS 

Sample  size  was  the  primary  limitation  of  the  current  study,  both  in  terms  of 
confirmed  case  sample,  as  well  as  the  number  of  participants.  Larger  sample  sizes  would 
produce  more  statistical  power,  and  it  is  hypothesized  that  they  would  have  an  impact  on 
the  significance  of  some  of  the  findings  of  this  study.  Additionally,  the  study  participants 
were  experienced  polygraphists  who  had  attended  a  continuing  education  seminar  and 
classroom  instruction  on  ESS.  It  cannot  be  concluded  that  these  cohorts  are  representative 
of  the  wider  population  of  polygraphists.  Another  limitation  is  that  it  is  not  known  how 
the  confirmed  cases  came  to  be  selected  into  the  archive,  other  than  being  confirmed 
cases.  The  researchers  intentionally  used  cases  confirmed  by  extra-polygraph  means,  but 
one  must  consider  that  the  selection  may  potentially  lead  to  criterion  accuracy  rates  that 
could  be  overestimated. 

D,  RECOMMENDATIONS  FOR  FUTURE  RESEARCH 

Several  recommendations  for  future  study  can  be  made  as  the  result  of  this 
research.  First,  since  the  sample  size  was  small,  it  is  possible  that  the  statistical  power  in 
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a  larger  sample  could  reveal  differences  that  escaped  detection  in  this  project.  This  larger 
sample  size  may  be  of  interest  in  research  by  type  of  employment,  decision  accuracy, 
criterion  accuracy,  and  other  considerations.  It  is  of  note  that  government  polygraphists, 
particularly  federal  government  examiners,  typically  attend  government-sponsored 
polygraph  schools,  while  private  examiners  typically  attend  private  schools,  although 
many  private  examiners  are  retired  government  polygraphists.  The  results  may  reveal 
differences  in  instruction,  expectations,  types  (quality)  of  exams  conducted,  or  overall 
workload  (number  of  tests  conducted),  among  other  possible  variables  that  can  then  be 
studied. 

Another  aspect  of  test  data  analysis  that  may  be  of  interest  is  the  amount  of  time 
required  to  use  the  various  types  of  hand-scoring  techniques.  These  time  studies  can  then 
be  correlated  to  other  demographic  facets  of  the  participants,  again  including  age, 
experience,  and  type  of  employment.  The  p-value  tables  for  ESS  are  well  developed, 
although  their  significance  to  field  polygraphists,  decision  makers,  and  other  consumers 
is  not  well  known. 

Response  bias  is  another  issue  that  was  not  addressed  by  this  study.  It  has  a  direct 
impact  on  sensitivity  and  should  be  further  studied.  Further  research  into  the  potential 
importance  of  this  attribute  of  ESS  and  its  potential  contribution  to  policy  decisions 
should  be  undertaken.  This  research  suggests  that  ESS  can  at  least  complement,  if  not 
supplant,  the  two  compared  TDA  systems,  and  perhaps  others,  to  increase  the  value  of 
polygraph  to  homeland  security  and  the  war  on  terror.  Further  research  should  be 
conducted  into  this  potential. 

E.  CONCLUSION 

The  first  conclusion  that  can  be  drawn  from  this  study  is  that  ESS  has  at  least  as 
much  diagnostic  ability  as  the  three-position  TDA  system  and  the  seven-position  TDA 
system,  even  taking  into  consideration  the  newness  of  the  ESS  to  the  polygraph 
profession.  Despite  the  limitations  of  the  small  sample  size,  the  study  produces  partial 
evidence  and  suggests  that  ESS  has  consistently  high  criterion  accuracy.  The  study  hints 
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that  ESS  seems  to  have  a  better  speeifieity,  less  ineonelusive  rates,  and  at  least  equivalent 
error  rates  as  the  three-position  and  seven-position  TDA  systems.  Lastly,  the  study  seems 
to  support  past  researeh  into  ESS  that  the  length  of  experienee  has  no  impact  on  the 
ability  of  the  polygraphist  to  apply  ESS  scoring  rules.  This  offers  a  particular  advantage 
over  the  more  complex  scoring  systems,  which  include  more  features  and  scoring  rules, 
since  graduates  of  polygraph  schools  seldom  have  time  to  ease  into  their  new  jobs.  That 
is,  the  new  graduate  of  a  polygraph  school  can  typically  expect  that  soon  after  his 
assignment,  he  will  undertake  a  polygraph  examination  that  has  high  impact  and 
consequences.  The  impact  and  consequences  can  literally  save  or  cost  lives,  determine 
the  future  course  of  major  tactical  plans  and  actions,  or  forever  change  the  lives  of 
individuals. 

It  is  imperative  that  those  who  can  impact  the  use  of  the  polygraph  in  the  United 

States  continue  to  pursue  the  lofty  goal  of  sound  and  scientifically  based  lie  detection 

techniques,  procedures,  instrumentation,  and  technology.  Consideration  must  be  given  to 

programs  and  projects  that  will  get  the  information  about  best  practices  into  the  hands  of 

practitioners,  decision  makers,  and  consumers,  some  of  whom  have  little  knowledge 

about  the  abilities  or  contributions  of  the  polygraph.  There  are  decision  makers — such  as 

military  officers,  police  chiefs,  judges  and  prosecutors,  government  program  directors, 

government  officials  and  others — who  make  decisions  based  upon  polygraph 

examinations  and  subsequent  rendered  opinions,  who  have  never  been  educated  about  the 

polygraph.  They  do  not  know  that  there  are  methods  available,  based  on  well-founded 

chosen  policy  decisions,  that  will  better  provide  them  with  the  information  that  they  want 

to  have,  taking  into  the  account  sensitivity,  specificity,  error  rates,  and  inconclusive  rates 

that  ESS  offers.  It  could  well  be  worthwhile  for  the  profession  to  develop  educational 

seminars  to  inform  stakeholders  about  these  considerations.  This  could  be  a  particularly 

worthwhile  endeavor  for  the  American  Polygraph  Association  and  the  American 

Association  of  Police  Polygraphists.  Given  that  these  two  associations  already  have 

networks  with  various  stakeholders,  such  as  the  Department  of  Defense,  the  International 

Association  of  Chiefs  of  Police,  the  National  Sheriffs  Association,  and  their  state 

counterparts,  short  programs  could  be  developed  to  introduce  these  ideas  at  their 
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respective  conferences  and  meetings  and  provide  follow-up  through  articles  in  their 
widely  circulated  professional  publications.  It  is  also  important  to  reach  rank-and-file 
personnel  who  are  actually  in  the  field  and  may  be  unaware  of  the  current  state  of  the 
polygraph  and  best  practices.  The  author  is  aware  of  state  law  enforcement  academies 
that  address  the  polygraph  in  basic  courses  as  well  as  continuing  education  courses.  There 
are  analogous  educational  undertakings  in  the  military  and  other  programs  through  which 
these  short,  informational  classes  could  be  offered. 

In  terms  of  the  profession  itself,  there  must  be  a  major  internal  push  to  continue 
the  research  that  has  been  undertaken  in  the  last  several  years.  We  must  keep  our  eye  on 
the  target,  and  that  target  cannot  be  misidentified.  As  the  National  Research  Council 
suggested  in  its  seminal  report,  the  concern  must  be  on  national  security  and,  by 
implication,  homeland  security  and  the  war  on  terror.  If  research  into  lie  detection  and 
other  social  sciences  identifies  better  methods,  instrumentation,  technologies,  and 
techniques,  then  they  must  be  further  studied  and  embraced,  if  proven,  even  at  the 
expense  of  letting  go  of  what  we  know  and  what  gives  us  comfort. 

In  terms  of  the  present,  careful  consideration  must  be  given  as  to  how  to  keep 
current  practitioners  within  the  bounds  of  known  best  practices.  Scientific  and  scholarly 
research  and  peer-reviewed  articles  are  part  of  that  equation;  however,  one  must  not  lose 
sight  of  the  polygraphist  in  the  field  whose  primary  concern  is  learning  today  what  can  be 
applied  tomorrow.  The  science  and  research  must  be  translated  and  presented  in  such  a 
way  that  these  individuals  take  an  interest  in  it,  understand  it,  and  apply  it. 

Lastly,  this  study  supports  some  of  the  findings  of  previous  research  into  the 
Empirical  Scoring  System.  It  seems  to  support  the  position  that  ESS  can  complement  the 
three-position  TDA  and  the  seven-position  TDA  systems  and  potentially  others.  This 
study  did  not  find  that  ESS  improved  the  scoring  ability  of  the  polygraphists.  It  does 
support  the  position  that  ESS  offers  the  ability  of  polygraph  consumers  to  choose  their 
own  tolerance  for  risk,  something  that  is  not  readily  available  with  other  scoring  systems. 
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This  ability  for  the  consumer  to  choose  levels  of  risk  when  relying  on  the  polygraph  is 
important,  but  often  not  understood,  and  it  can  play  a  valuable  role  in  homeland  security 
and  the  war  on  terror. 

No  theory  is  going  to  be  inviolate.  Let  me  put  it  clearly.  The  only  kind  of 
theory  that  can  be  proposed  and  ever  will  be  proposed  that  absolutely  will 
remain  inviolate  for  decades,  certainly  centuries,  is  a  theory  that  is  not 
testable.  If  a  theory  is  at  all  testable,  it  will  not  remain  unchanged.  It  has  to 
change.  All  theories  are  wrong.  One  does  not  ask  about  theories,  can  I 
show  that  they  are  wrong  or  can  I  show  that  they  are  right,  but  rather  one 
asks,  how  much  of  the  empirical  realm  can  it  handle  and  how  must  it  be 
modified  and  changed  as  it  matures?  (Chadee,  2011,  quoting  Leon 
Festinger,  1987) 
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APPENDIX  A.  SIMPLIFIED  HANDSCORING  PROCEDURES 


Instructions 

Score  each  case  from  the  eomputer  sereen,  without  printing  the  charts  and  without  the  use 
of  any  mechanical  or  computerized  measurement  devices. 

Do  not  attempt  to  render  a  decision  for  any  of  the  cases.  Limit  your  activity  to  scoring  the 
data  and  assigning  points.  The  investigators  will  use  your  data  to  determine  optimal  cut- 
scores. 

Physiological  Signals 

1 .  Respiratory  Suppression  (deereased  RLL) 

•  Decrease  in  respiration  amplitude  for  three  respiratory  eyeles,  beginning  after 
the  stimulus  onset 

•  Decrease  in  respiration  rate  (slowing)  for  three  respiratory  cycles,  beginning 
after  the  stimulus  onset 

•  Temporary  increase  in  respiratory  baseline  (non-RLL  feature)  or  three 
respiratory  cycles,  beginning  after  the  stimulus  onset 

2.  Electrodermal  amplitude  of  increase 

3.  Cardiograph  amplitude  of  increase,  measured  at  the  diastolic  baseline 

Interpretation  Rules 

1.  Assign  values  of  +,  -  or  0  using  the  3 -position  system  and  the  bigger  is  better 
principle 

•  Seore  eaeh  relevant  question  to  the  stronger  of  bracketing  comparison 
questions,  for  each  component  sensor 

•  Do  not  be  concerned  about  traditional  scoring  ratios 

•  If  you  can  visually  determine  that  one  segment  is  larger  than  another,  then  you 
can  assign  a  point 

•  For  EDA,  assign  a  value  of  +2,  -2  or  0  in  aecordance  with  the  ESS  rules. 

2.  Score  only  timely  reaetions 

•  Do  not  be  eoncerned  about  traditional  scoring  periods 

•  Do  not  score  reactions  that  begin  before  the  stimulus 

•  Do  not  seore  reaetions  that  begin  long  (several  seconds)  after  the  stimulus  or 
answer 

3.  Score  only  normal  interpretable  data 

•  Do  not  attempt  to  seore  data  that  are  affected  by  movement  artifaets 

•  Do  not  attempt  to  score  messy  or  unstable  segments  of  data 

•  Do  not  attempt  to  score  data  of  unusual  response  quality  (dampened  or 
exaggerated) 
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APPENDIX  B.  3-POSITION  HANDSCORING  PROCEDURES 


Instructions 

Score  each  case  from  the  computer  screen,  without  printing  the  charts  and  without  the  use 
of  any  mechanical  or  computerized  measurement  devices. 

Do  not  attempt  to  render  a  decision  for  any  of  the  cases.  Limit  your  activity  to  scoring  the 
data  and  assigning  points.  The  investigators  will  use  your  data  to  determine  optimal  cut- 
scores. 


Physiological  Signals 

Evaluate  physiological  signals  in  accordance  with  Federal  3-postion  test  data  analysis 
(TDA)  scoring  system. 
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APPENDIX  C.  7-POSITION  HANDSCORING  PROCEDURES 


Instructions 

Score  each  case  from  the  computer  screen,  without  printing  the  charts  and  without  the  use 
of  any  mechanical  or  computerized  measurement  devices. 

Do  not  attempt  to  render  a  decision  for  any  of  the  cases.  Limit  your  activity  to  scoring  the 
data  and  assigning  points.  The  investigators  will  use  your  data  to  determine  optimal  cut- 
scores. 


Physiological  Signals 

Evaluate  physiological  signals  in  accordance  with  Federal  7-postion  test  data  analysis 
(TDA)  scoring  system. 
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APPENDIX  D.  PARTICIPANT  DEMOGRAPHIC  DATA  SHEET 


Please  provide  your  information  as  follows:  you  only  need  to  do 

THIS  STEP  FOR  A  SINGLE  SCORE  SHEET. 

RANDOM  NUMBER  ASSIGNED; 

GENDER;  □  M  □  E  AGE; 

POLYGRAPH  EXPERIENCE;  Years  (rounded  to  elosest  year) 

POLYGRAPH  SCHOOL  ATTENDED; 

HIGHEST  LEVEL  of  EDUCATION;  □  High  Sehool  □  Assoeiate  □  Baehelor’s  □ 
Master’s  O  PhD 

EMPLOYER;  Q  Eederal  Q  State  O  County/Loeal  Q  Private 
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APPENDIX  E.  SCORING  SHEET 


DATE; 


SCORE  SHEET 

CHART  ID#: 


Chart  ID 

Question  ID 

R5 

Question  ID 

R7 

Chart  1 

P 

E 

C 

Chart  2 

P 

E 

C 

Chart  3 

P 

E 

C 

Spot  Total  3  charts 

FINAL  OPINION  (Check  one):  NDI  DI  INC  □  □  □ 

CONFIDENCE  IN  YOUR  OPINION  (Check  one):  Low  Average  High  Q  D  D 
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APPENDIX  F.  GLOSSARY31 


3-position  scale;  Abbreviated  form  of  the  7-position  scale  for  polygraph  test  data 
analysis.  The  major  differenee  is  that  the  range  of  values  for  eaeh  eomparison  is  from  -1 
to  +1,  rather  than  the  range  of  -3  to  +3  in  the  7-position  seoring  system. 

7-position  scale;  System  of  assigning  values  to  individual  physiologieal 
responses  in  polygraph,  based  on  differential  responding  to  relevant  and  eomparison 
questions.  The  values  in  7-position  seoring  are  whole  numbers  between  -3  and  +3.  By 
eonvention,  negative  values  represent  greater  responding  to  relevant  questions,  while 
positive  values  indieate  greater  responses  to  eomparison  questions.  A  zero  usually 
indieates  equal  or  no  reaetions  to  the  relevant  and  eomparison  questions,  or  that  the 
spot  does  not  meet  minimum  standards  for  interpretation.  The  assigned  numbers  are 
summed  aeross  all  three  polygraph  parameters  for  eaeh  question  for  all  spots  and  all 
eharts.  There  are  thresholds  for  determinations  of  truthfulness  or  deeeption,  with  an 
ineonelusive  region  separating  them.  In  the  polygraph  literature,  the  7-position  seale  is 
sometimes  referred  to  as  a  semi-objeetive  scoring  system.  There  are  three  major  versions 
of  the  7-position  scoring  system;  Backster,  Utah,  and  DoDPI  (now  NCCA). 

American  Association  of  Police  Polygraphists  (AAPP);  Professional 
organization  dedicated  to  serving  the  needs  of  criminal  justice  and  military 
polygraph  examiners.  Founded  in  1977,  AAPP  has  about  1200  members. 

American  Polygraph  Association  (APA);  Professional  organization  made  up  of 
polygraph  professionals  from  law  enforcement,  government,  and  the  private  sector. 
Incorporated  in  1966  in  Washington,  D.C.,  the  APA  resulted  from  the  merger  of  the 
several  polygraph  associations,  including  the  Academy  of  Scientific  Interrogation,  the 
American  Academy  of  Polygraph  Sciences,  the  National  Board  of  Polygraph 
Examiners,  the  International  Association  of  Polygraph  Examiners,  and  the 
International  Association  for  Polygraph  Research.  It  currently  has  about  2,400  members 
and  is  headquartered  in  Chattanooga,  Tennessee. 

ANOVA;  ANalysis  Of  Variance;  A  family  of  statistical  procedures  designed  to 
partition  the  total  amount  of  variability  in  a  set  of  scores  into  two  parts;  the  parts  that  can 
and  cannot  be  accounted  for  by  the  independent  variable(s).  The  ANOVA  is  often  used 
in  psychophysiologic  and  polygraph  research. 

Backster,  Cleve;  Originator  of  the  Trizone  Comparison  Test,  more  often  referred 


31  This  glossary  was  adapted  in  large  part  from  the  “Terminology  Referenee  for  the  Seienee  of 
Psyehophysiologieal  Deteetion  of  Deeeption”  (Krapohl  &  Sturm,  2002b).  It  is  used  with  permission  of  the 
editor  of  the  Polygraph,  the  Journal  of  the  Ameriean  Polygraph  Assoeiation. 
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to  as  the  Zone  Comparison  Teehnique.  Baekster  also  introdueed  to  the  polygraph 
profession  the  eoneepts  of  “psyehologieal  set,”  zones,  spots,  superdampening,  antielimax 
dampening,  symptomatie  (outside  issue)  questions,  exclusionary  comparison  questions, 
and  7-  position  scoring  for  use  in  chart  analysis.  Backster’s  concepts  have  been  widely 
adopted  into  practice  in  polygraph,  and  there  are  several  contemporary  methods  that 
can  be  traced  back  to  Backster’s  original  design.  Baekster  heads  a  private  training 
facility  in  San  Diego,  California,  and  has  provided  training  for  thousands  of  examiners 
since  the  late  1940s.  He  also  began  the  C.I.A.  polygraph  program  in  1949. 

bias:  In  research  it  is  a  source  of  systematic  error  that  can  influence  the 
outcome  of  the  experiment.  Bias  can  be  introduced  into  research  by  factors  such  as, 
among  others,  nonrandom  sampling,  faulty  subject  instructions,  or  expectations  of  the 
researcher  or  the  participants.  In  a  polygraph  study  looking  at  the  validity  of  blind 
scoring,  for  example,  a  researcher  using  only  cases  which  were  verified  by  the  original 
examiner  are  likely  biasing  the  study,  since  cases  in  which  the  original  examiner 
made  the  wrong  diagnosis  could  be  systematically  excluded  from  the  research 
sample.  Researchers  attempt  to  control  bias  through  experimental  design. 

blind  chart  analysis:  Evaluation  of  polygraph  recordings  without  the  benefit  of 
extrapoly  graphic  information,  such  as  subject  behavior,  case  facts,  pretest  admissions, 
base  rates  of  deception,  etc.  Studies  employ  various  degrees  of  “blindness.”  It  is  a 
popular  research  approach  to  gauge  interrater  reliability.  Assessments  of  the  accuracy  of 
polygraph  test  evaluation  techniques  also  use  blind  chart  analysis. 

cardiograph:  General  term  for  any  recording  of  heart  activity.  In  polygraph 
the  use  of  a  blood  pressure  cuff  to  monitor  relative  arterial  blood  pressure  changes 
and  pulse  wave  is  more  precisely  described  as  sphygmography  (recording  of  the  arterial 
pulse)  or  occlusion  plethysmography  (partial  blockage  of  circulation  to  measure  volume 
changes  in  a  body  part).  While  cardiograph  is  not  incorrect  in  this  context,  it  lacks 
precision  in  denoting  the  actual  phenomenon  being  recorded  in  polygraph.  The  term 
cardiograph  in  the  psychophysiological  and  medical  literature  most  often  refers  to  the 
electrocardiograph. 

chart:  Graphical  record  of  phenomena.  In  polygraph  it  refers  to  the  polygram 
on  which  is  recorded  the  physiological  activity  during  testing.  The  term  chart  is 
sometimes  used  interchangeably  with  test. 

chart  markings:  Annotation  of  the  physiologic  tracings  to  denote  stimulus 
(question)  onset  and  offset,  examinee’s  answer,  question  number,  question  label, 
artifacts,  and  other  details  important  to  the  interpretation  of  the  physiological  data. 

Comparison  Question  Technique  (CQT):  An  umbrella  term  for  standard 
testing  formats  that  use  probable-lie  or  directed-lie  comparison  questions.  Included  are 
the  Reid,  the  MGQT,  the  Zone  Comparison,  the  Positive  Control,  the  Utah,  the  Arthur, 
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the  Quadri-Track,  and  the  Test  for  Espionage  and  Sabotage.  None  of  the  following  are 
considered  CQTs:  Relevant/Irrelevant,  Peak  of  Tension,  and  Concealed  Information 
Tests. 


control  question;  Superseded  term,  now  called  the  comparison  question.  Class  of 
questions  used  in  deception  examinations  that  serves  to  elicit  physiologic  responses  of 
innocent  examinees.  There  are  several  types,  such  as  the  exclusionary,  non-exclusionary, 
probable-lie,  directed-lie,  the  positive,  and  minor  variations.  The  term  “control”  in 
polygraph  traces  its  roots  to  the  1930s  and  to  what  are  now  called  stimulation  tests. 
These  tests  were  used  as  “controls”  for  the  production  of  deception  response  patterns 
that  would  later  be  compared  with  responses  to  relevant  questions  in  the 
Relevant/Irrelevant  technique.  In  1947  John  Reid  published  a  paper  in  which  he  referred 
to  two  types  of  questions  as  controls — one  he  called  a  “guilt  complex”  and  the  other  a 
“comparative  response”  question,  the  latter  being  a  probable-lie  question.  The 
“comparative  response”  question  was  called  a  “control  question”  in  a  paper  published  by 
Fred  Inbau  in  1948,  and  the  name  became  the  standard  terminology  in  polygraph  for 
nearly  50  years.  This  was  not  the  first  use  of  this  class  of  question,  however.  Walter 
Summers  used  similar  questions  with  his  Fathometer  technique  which  he  labeled 
emotional  standards  as  early  as  1939,  and  they  were  used  by  New  York  State  Troopers 
from  1939  until  at  least  1952.  Elizabeth  Marston,  widow  of  William  Marston,  and  Olive 
Richard,  Marston’s  former  secretary,  reported  that  they  had  participated  in  deception 
examinations  with  Marston  some  years  before  in  which  “hot”  questions  were  used  for 
comparison.  A  typical  hot  question  would  be,  “Did  you  ever  think  of  stealing  money 
from  that  safe?”  Elizabeth  stated  during  an  interview  that  they  did  not  believe  it  wise  to 
publish  these  types  of  questions,  and  consequently  they  have  not  been  generally  credited 
with  this  contribution  to  the  science.  Beginning  in  the  1970s,  critics  of  polygraph  noted 
that  the  word  “control”  as  used  in  polygraph  tests  did  not  meet  the  criteria  of  the  term  as 
used  in  science.  The  term  has  since  been  replaced  by  comparison  question  in 
publications  of  the  American  Polygraph  Association,  American  Society  for  Testing  and 
Materials,  federal  polygraph  programs,  and  scientific  papers. 

Deception  Indicated  (DI):  Along  with  NDI  (No  Deception  Indicated)  and 
Inconclusive,  a  conventional  term  for  a  polygraph  outcome.  A  decision  of  DI  in 
polygraph  means  that  (1)  the  physiological  data  are  stable  and  interpretable,  and  (2) 
the  evaluation  criteria  used  by  the  examiner  led  him  to  conclude  that  the  examinee  is  not 
wholly  truthful  to  the  relevant  issue.  The  DI  and  NDI  decision  options  are  used 
primarily  in  single-issue  testing,  and  they  correspond  with  SPR  {Significant 
Physiological  Responses)  and  NSPR  {No  Significant  Physiological  Responses)  in 
multiple-issue,  or  screening,  examinations  with  the  US  Government. 

decision  rule;  Generically,  decision  rules  determine  when  data  meet  the  criteria 
for  inclusion  in  a  particular  category.  Decision  rules  are  the  final  step  in  polygraph 
numerical  scoring,  producing  decisions  of  Deception  Indicated,  No  Deception  Indicated, 
and  Inconclusive.  Optimal  decision  rules  require  the  following;  tracing  feature 
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selection;  development  of  best  scoring  rules;  consideration  given  for  base  rates; 
assessing  and  weighting  collateral  or  countervailing  information;  and  performance  of  a 
cost  and  benefit  analysis  to  determine  the  achievable  level  of  accuracy  and  errors  that 
meet  the  needs  of  the  consumer.  In  polygraphy,  feature  selection  and  scoring  rules  have 
been  thoroughly  investigated.  There  are  also  decision  rules  in  some  polygraph  analysis 
systems  that  include  extra-polygraphic  information  as  part  of  the  decision  process, 
though  there  is  no  validated  method  yet  published.  However,  prevailing  decision  rules 
do  not  consider  the  base  rate  issue,  nor  has  a  formal  cost-benefit  analysis  been 
published  that  identifies  the  appropriate  cutting  scores  for  a  set  of  conditions. 

Degrees  of  freedom  (df):  For  any  set  of  values,  every  value  within  a  set  can  be 
freely  selected  except  the  last,  which  is  determined.  Or,  in  other  words,  when  there  is 
only  one  value  remaining,  the  final  selection  is  not  free  to  vary.  Technically,  the 
concept  of  degrees  of  freedom  refers  to  the  number  of  independent  observations  minus 
the  number  of  parameters  being  estimated.  The  degrees  of  freedom  are  essential  in  the 
calculation  of  the  threshold  or  critical  value  of  a  test  distribution. 

Department  of  Defense  Polygraph  Institute  (DoDPI):  See  National  Center  for 
Credibility  Assessment  (NCCA). 

discontinuous  blood  pressure  method;  Deception  test  procedure  developed  by 
William  Marston  before  1915.  Marston’s  instrumentation  was  a  standard 
sphygmomanometer  that  he  used  to  take  intermittent  systolic  blood  pressure 
measurements  during  questioning  on  relevant  and  irrelevant  topics.  He  plotted  these 
measurements  by  hand,  creating  a  curve  that  was  interpreted  for  assessing  deception.  In 
1923  Marston  attempted  to  have  the  results  of  his  deception  test  entered  into  evidence  in 
a  murder  trial  in  Washington,  DC.  The  Frye  case,  which  was  the  first  to  consider 
deception  tests,  established  the  precedent  for  exclusion  of  “lie  detector”  results.  The 
discontinuous  blood  pressure  method  did  not  enjoy  widespread  field  acceptance,  and 
there  are  no  reports  of  its  use  after  the  1930s.  In  the  1920s,  William  Marston 
included  a  cardiopneumo  polygraph  to  augment  his  discontinuous  blood  pressure 
method.  In  practice,  Marston  and  his  wife,  Elizabeth,  would  either  ask  the  examination 
questions  or  take  the  blood  pressure  measurements,  while  Olive  Richard,  an  assistant, 
operated  the  equipment.  If  a  stenographer  were  present,  there  were  four  participants  in 
the  administration  of  the  examination  in  addition  to  the  examinee.  While  William 
Marston  was  usually  the  examiner,  Elizabeth  Marston  and  Olive  Richard  did  conduct 
examinations  on  occasion  without  him,  making  them  the  first  women  in  this  field. 
Given  the  great  methodological  and  instrumentation  differences,  Marston’s 
discontinuous  blood  pressure  method  is  not  truly  in  the  lineage  of  modern 
polygraphy,  though  it  is  frequently  included  in  history  lessons  at  polygraph  schools. 

electrodermal  activity  (EDA);  All  exosomatic  and  endosomatic  changes  in  the 
electrical  properties  of  the  skin. 
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endosomatic:  Something  produced  from  within  the  body  itself  One  type  of 
electrodermal  response,  skin  potential,  is  produced  by  electrical  activity  generated  by  the 
dermis.  Similarly,  EEG  signals  are  generated  by  bioelectric  processes  in  the  brain,  and 
EKG  from  the  heart.  Eor  contrast,  see  exosomatic. 

examination;  The  entirety  of  the  polygraph  process,  including  pretest,  test, 
and  post-test  elements,  from  onset  to  completion. 

exosomatic:  Something  generated  from  outside  the  body.  Both  skin  conductance 
and  skin  resistance  are  exosomatic  measures  because  electrical  currents  are  applied 
from  outside  sources  to  detect  the  electrodermal  activity.  As  opposed  to  endosomatic. 

extrapolygraphic:  That  which  is  not  derived  exclusively  from  the  polygraph 
tracings.  Some  polygraph  schools  teach  that  there  are  sources  of  information  to  assist  the 
polygraph  examiner  in  rendering  a  decision  that  are  not  registered  on  the  strip 
recordings.  These  sources  of  extrapolygraphic  information  include  case  facts,  behavioral 
indicators,  and  base  rates.  Blind  interpretation  of  polygraph  charts  is  one  way  of  parsing 
out  what  information  is  available  in  the  test  recordings  and  that  which  comes  from  other 
sources. 

false  negative;  The  failure  to  detect  the  presence  of  a  particular  event  or  item.  A 
false  negative  in  polygraph  refers  to  the  incorrect  decision  that  deception  was  not 
practiced  by  the  examinee. 

false  positive:  The  false  detection  of  something  that  is  not  actually  present.  In 
polygraph,  it  is  the  incorrect  decision  that  deception  was  practiced  by  the  examinee. 

feature;  In  polygraphy,  the  term  refers  to  a  specific  waveform,  pattern  or 
measurement  in  a  tracing.  Eeatures  are  the  fundamental  components  of  chart 
interpretation,  on  which  scoring  and  decision  rules  depend.  Currently  there  are  11 
individually  validated  manual  scoring  features.  In  the  respiration  channel  they  are; 
apnea,  baseline  increase,  suppression,  and  increase  in  cycle  time  (slowing).  Eor  the 
electrodermal  channel  they  are  peak  amplitude,  complexity,  and  duration.  In  the 
cardiograph,  the  features  are  amplitude  and  duration.  The  linger  plethysmograph  relies 
on  the  duration  and  magnitude  of  the  constriction  of  the  pulse  amplitude.  Other 
features  are  sometimes  taught  as  part  of  validated  scoring  systems,  though  their 
efficacy  has  not  yet  been  established. 

field  research:  Scientific  investigation  using  actual  polygraph  cases  conducted 
by  practicing  examiners  on  suspects,  witnesses,  and  victims.  In  contrast  to  laboratory 
research. 

forensic  psychophysiological  detection  of  deception  examination;  A  process 
that  encompasses  all  activities  that  take  place  between  a  forensic  psychophysiologist  and 
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an  examinee  during  a  speeifie  series  of  interaetions.  These  interaetions  inelude  the 
pretest  interview;  the  use  of  the  polygraph  to  eolleet  physiologieal  data  from  the 
examinee  while  presenting  a  series  of  tests;  the  diagnostie  phase,  whieh  ineludes  the 
analysis  of  physiological  data  in  correlation  with  the  questions  asked  during  each  test  to 
support  a  diagnostic  decision;  and  the  post-test  phase,  which  may  or  may  not  include 
interrogation  of  the  examinee. 

forensic  psychophysiologist;  A  person  who  has  successfully  completed  an 
academic  program  in  Forensic  Psychophysiology,  including  the  appropriate  internship, 
which  has  been  inspected  and  accredited  by  the  American  Polygraph  Association. 

forensic  psychophysiology:  Defined  by  Dr.  William  J.  Yankee  in  1992  as  the 
science  that  deals  with  the  relationship  and  applications  of  psychophysiological  detection 
of  deception  (polygraph)  tests  to  the  legal  system.  It  is  the  academic  discipline  that 
provides  the  student,  the  practitioner,  and  the  researcher  with  the  theoretical  and  applied 
psychological,  physiological,  and  psychophysiological  fundamentals  for  a  thorough 
understanding  of  polygraph  tests,  and  the  skills  and  qualifications  for  conducting 
polygraph  examinations.  The  modifier  “forensic”  delineates  and  delimits  this  discipline 
from  the  broader  discipline  of  psychophysiology. 

Galvanic  Skin  Response  (GSR):  A  superseded  term  for  the  electrodermal 
response  measured  exosomatically  by  the  change  in  the  electrical  resistance  of  skin. 
GSR  is  sometimes  erroneously  called  Galvanic  Skin  Resistance  or  Galvanic  Skin 
Reflex.  The  modern  term  is  electrodermal  response  (EDR). 

galvanograph;  Polygraph  component  responsible  for  producing  the  graphic 
recording  of  skin  resistance. 

ground  truth;  Reality.  In  the  polygraph  context  it  is  the  veridical  state  of 
truthfulness  or  deception  against  which  polygraph  outcomes  are  compared  in  validity 
studies.  Ground  truth  is  an  elusive  feature  in  field  studies  because  it  is  difficult  to 
independently  verify  guilt  or  innocence  in  many  cases.  In  laboratory  studies,  it  is 
delineated  into  programmed  guilty  and  programmed  innocent  groups. 

inconclusive:  polygraph  outcome  where  testing  was  completed,  but  neither 
deception  nor  truthfulness  can  be  diagnosed  because  the  physiological  data  are 
inconsistent,  inadequate,  artifacted,  or  contaminated.  There  is  disagreement  whether  an 
inconclusive  outcome  should  be  considered  an  error  when  computing  validity  of 
polygraph.  Some  argue  that  examinees  are  either  truthful  or  deceptive,  but  never 
inconclusive;  therefore,  such  an  outcome  is  necessarily  in  error.  Conversely,  in  the 
forensic  sciences  it  has  been  asserted  that  the  inconclusive  outcome  is  used  to  assess 
utility,  but  not  validity,  because  samples  in  forensic  disciplines  are  often  inadequate,  or 
contaminated.  For  example,  fingerprint  data  is  more  frequently  inadequate  than 
adequate,  though  fingerprint  analysis  is  considered  highly  accurate  in  spite  of  the 
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relatively  modest  pereentage  of  oases  that  it  can  render  a  positive  identification. 
Because  of  this  controversy,  polygraph  validity  studies  report  accuracies  both  with  and 
without  inconclusive  results.  In  practice,  inconclusive  outcomes  are  the  default  results 
when  the  criteria  for  deception  or  not-deception  decisions  are  not  satisfied.  Alternate 
term  is  indefinite,  or  no  opinion. 

indefinite:  See  inconclusive. 

irrelevant  question:  A  question  designed  to  be  emotionally  neutral  to 
examinees.  Irrelevant  questions  are  most  often  placed  in  the  first  position  of  a  question 
list,  because  an  orienting  response  of  no  diagnostic  value  usually  follows  the 
presentation  of  the  first  question.  In  CQT  formats  it  is  also  used  after  a  relevant  or 
comparison  question  that  has  elicited  a  strong  response  so  as  to  permit  physiologic 
arousal  levels  to  return  to  baseline  before  presenting  another  diagnostic  question. 
Irrelevant  questions  are  used  in  nearly  every  type  of  polygraph  test.  Also  called  norms 
or  neutrals. 

lie  detector:  A  common  but  inaccurate  term  for  the  polygraph. 

Lombroso,  Cesare:  Italian  physician  and  biologist  who  first  employed 
instrumentation  to  the  problem  of  detecting  deception  in  criminals,  and  employed  it 
with  live  criminal  cases.  He  reported  in  1885  in  the  second  edition  of  his  book,  L  ’Homme 
Criminel,  the  use  of  the  “hydrosphygmograph,”  a  mechanical  arrangement  invented  for 
medical  purposes,  to  detect  blood  pressure  changes  during  interrogation.  One  of  his 
students,  Angelo  Mosso,  also  went  on  to  perform  instrumental  deception  detection 
experiments. 

Lykken,  David  T.:  Psychologist  and  ardent  critic  of  the  CQT.  Dr.  Lykken  has 
produced  numerous  writings  for  the  scientific  and  general  press,  including  a  book,  A 
Tremor  in  the  Blood,  in  which  he  argued  strongly  that  the  CQT  is  fatally  flawed,  that  it 
resulted  in  wrongful  criminal  convictions,  and  it  was  vulnerable  to  countermeasures  by 
the  guilty.  Dr.  Lykken  did  not  publish  any  research  of  his  own  on  the  CQT,  but  used 
anecdotal  histories  and  interpretations  of  other  research  to  form  his  arguments.  Lykken 
endorses  the  Guilty  Knowledge  Test  (GKT),  an  alternate  polygraph  testing  format.  The 
GKT  has  never  been  widely  used  in  the  field. 

Lykken  scoring:  System  of  scoring  electrodermal  responses  in  the  Concealed 
Information  Test  (formerly  the  Guilty  Knowledge  Test)  and  establishing  the  threshold 
for  rendering  decisions.  The  Lykken  scoring  system  compares  the  responses  of  the 
critical  test  items  in  a  rank  order  method  against  those  of  the  neutral  items.  One  variant 
uses  averaged  ranks. 

Marston,  William:  Psychologist,  inventor  of  the  discontinuous  blood  pressure 
method  deception  test,  and  author  of  the  1938  book  The  Lie  Detector  Test.  Marston 
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was  the  first  to  attempt  to  have  instrumental  deeeption  test  evidenee  entered  into 
evidenee  in  eourt,  whieh  resulted  in  the  Frye  deeision  of  1923.  Marston’s  test 
entailed  the  use  of  a  eonventional  blood  pressure  cuff  and  sphygmomanometer  with 
which  he  manually  plotted  the  examinee’s  blood  pressure  during  questioning  at  several 
points  during  the  interview.  He  taught  his  technique  to  the  U.S.  Army,  and  he  used  his 
method  to  resolve  espionage  cases  during  World  War  I.  Marston  had  several  interests, 
and  he  was  also  the  co-creator  of  the  Wonder  Woman  comic  book  character.  Both 
William  Marston  and  his  wife,  Elizabeth,  were  lawyers,  and  worked  together  to 
perform  deception  testing.  See  discontinuous  blood  pressure  method. 

mean:  The  average.  The  most  common,  the  arithmetic  mean,  is  the  sum  of 
values  divided  by  the  number  of  values.  If  five  subjects  in  a  polygraph  study  were  ages 
19,  23,  28,  22,  and  29,  the  mean  age  of  this  group  would  be  (19+23+28+22+29)  1  5  = 
24.2  years. 

median:  The  middle  score.  The  median  value  is  one  where  one-half  of  the  scores 
are  above  and  one-half  are  below  this  value.  It  is  in  the  middle  of  the  distribution,  but 
only  in  terms  of  order.  Medians  are  useful  when  evaluating  highly  skewed  distributions, 
such  as  national  housing  prices,  because  they  are  not  affected  by  extreme  scores. 
Medians  are  not  as  frequently  reported  in  polygraph  research,  but  may  have  application 
such  as  when  examinee  pools  have  characteristics  that  are  not  normally  distributed. 

mode:  The  most  commonly  occurring  value  in  a  distribution. 

Modified  General  Question  Test  (MGQT):  Test  format  patterned  after  the  Reid 
test  and  developed  by  the  U.S.  military.  Unlike  the  Zone  formats,  it  has  more  relevant 
questions  than  comparison  questions  and  does  not  include  symptomatic  questions, 
though  some  versions  employ  a  sacrifice  relevant  question.  The  MGQT  is  widely  used 
in  the  field. 

National  Center  for  Credibility  Assessment  (NCCA):  The  federal  government’s 
premiere  educational  center  for  polygraph  and  other  credibility  assessment  technologies 
and  techniques.  Its  central  mission  is  to  assist  federal  agencies  in  the  protection  of  U.S. 
citizens,  interests,  infrastructure  and  security  by  providing  the  best  education  and  tools 
for  credibility  assessment.  Formerly  the  Department  of  Defense  Polygraph  Institute 
(DoDPI),  then  the  Defense  Academy  for  Credibility  Assessment  (DACA). 

No  Deception  Indicated  (NDI):  In  conventional  polygraph,  NDI  signifies  that 
(1)  the  polygraph  test  recordings  are  stable  and  interpretable  and  (2)  the  evaluation 
criteria  used  by  the  examiner  led  him  to  conclude  that  the  examinee  was  truthful  to  the 
relevant  issue.  The  NDI  and  DI  (Deception  Indicated)  decision  options  are  used  in 
specific-issue  testing  and  correspond  to  NSPR  (No  Significant  Physiological  Responses) 
and  SPR  (Significant  Physiological  Responses)  in  multiple-issue,  or  screening, 
examinations. 
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No  Opinion;  Alternate  form  of  an  Inconclusive  call,  especially  in  the  federal 
Government.  Sometimes  used  to  denote  an  Incomplete  call  in  other  sectors. 

numerical  analysis;  Systematic  assignment  of  numbers  to  physiologic 
responses,  along  with  decision  rules,  so  that  polygraph  data  analysis  is  more 
objective  and  standardized.  The  first  such  system  was  published  by  Dr.  John  Winter 
in  1936.  Contemporary  numerical  analytic  methods  include  the  Rank  Order  Scoring 
System,  Horizontal  Scoring  System,  3 -position  scoring  system,  7 -position  scoring  system, 
and  Lykken  Scoring.  Sometimes  referred  to  as  semi-objective  analysis. 

numerical  approach;  Method  of  rendering  polygraph  decisions  that  are  based 
exclusively  on  numeric  values  that  have  been  assigned  to  physiological  responses 
recorded  during  a  structured  polygraph  examination.  The  numerical  approach  does  not 
consider  extra-polygraphic  information  such  as  case  facts  or  examinee  behaviors.  The 
numerical  approach  has  four  primary  components.  They  are;  feature  identification, 
numerical  value  assignment,  computation  of  the  numerical  values,  and  decision  rules. 
Current  numerical  approaches  include  the  Backster  method,  the  Utah  method,  and  the 
automated  computer  algorithms. 

Objective  Scoring  System  (OSS);  A  form  of  7-position  scoring  where  the 
individually  assigned  values  are  derived  from  ratios  that  come  from  measurements  of 
the  “Kircher  features.”  Because  the  scores  come  from  measurements,  the  OSS 
eliminates  subjectivity  in  chart  interpretation.  However,  it  is  very  time-intensive  when 
performed  manually,  and  impractical  for  routine  use.  The  OSS  has  been  automated  by 
at  least  one  computer  polygraph  manufacturer. 

pneumograph;  A  device  that  records  respiration,  and  one  of  the  three  traditional 
channels  of  the  modern  polygraph  used  in  polygraph.  Most  contemporary  polygraphs 
use  two  pneumograph  recordings;  abdominal  and  thoracic.  The  sensors  are  either  the 
traditional  convoluted  rubber  tube,  the  mercury  strain  gauge,  or  the  newer  piezoelectric. 

polygraph;  By  definition,  an  instrument  that  simultaneously  records  two  or  more 
channels  of  data.  The  term  now  most  commonly  signifies  the  instrument  and  techniques 
used  in  the  psychophysiological  detection  of  deception,  though  polygraphs  are  also 
used  in  research  in  other  sciences.  In  polygraph  the  polygraph  traditionally  records 
physiologic  activity  with  four  sensors;  blood  pressure  cuff,  electrodermal  sensors, 
and  two  respiration  sensors.  Some  instruments  also  record  finger  pulse  amplitude  using 
a  photoplethysmograph. 

Receiver  Operating  Characteristics  (ROC);  Psychophysical  conceptual  model 
for  detection  efficiency  based  on  signal  detection  theory  (SDT).  The  ROC  characterizes 
the  sensitivity  of  the  decision  criteria,  and  is  useful  to  predict  false  positive  and  false 
negative  rates  across  all  levels  of  a  criterion  (cutting  score,  in  polygraph).  It  is  an 
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extension  of  work  from  the  1940s  regarding  the  ability  of  radar  operators  to 
discriminate  radar  signals  of  friendly  aircraft  from  those  of  enemy  aircraft  or  noise. 

Relevant/Irrelevant  (RI)  Technique;  Family  of  polygraph  test  formats  in  which 
traditional  lie  comparison  questions  are  not  employed.  While  originally  used  in  criminal 
testing,  RI  tests  currently  are  more  often  found  in  multiple-issue  screening  applications. 
The  RI  test  can  trace  its  roots  to  word  association  tests  employed  in  the  early  1900s, 
and  these  word  tests  were  later  used  occasionally  during  the  monitoring  of  electrodermal 
activity.  The  RI  was  used  extensively  by  pioneers  John  Larson  and  Leonardo  Keeler  in 
the  1920s  through  the  1940s,  and  it  is  still  used  today.  Variants  include  the  NSA  RI, 
which  is  used  in  multiple-issue  screening,  and  the  Modified  Relevant/Irrelevant  (MRI), 
used  in  criminal  testing. 

relevant  question;  A  question  that  deals  with  the  true  issue  of  concern  to 
the  investigation.  In  addition  to  “did  you  do  it”  types  of  questions,  relevant  questions 
also  include  evidence-connecting  and  “do  you  know  who”  questions.  Strong  relevant 
questions  address  the  “did  you  do  it”  type  of  questions,  while  moderate-strength  relevant 
questions  address  evidence  connecting  and  prior  knowledge,  such  as  participation  in 
planning,  providing  help  to  the  perpetrators,  or  knowing  the  identity  of  the  perpetrators. 
Moderate-strength  relevant  questions  also  address  the  examinee’s  alibi  or  place  him  at 
the  scene  of  the  crime.  The  term  relevant  question  is  not  appropriate  for  Peak  of 
Tension  or  Concealed  Information  Tests.  Rather,  terms  such  as  key  and  critical  item 
are  used  in  these  formats.  Relevant  questions  are  sometimes  called  pertinent  questions 
or,  more  informally,  hot  questions. 

reliability;  Stability  or  consistency  of  measurement.  Reliability  studies  in 
polygraph  often  examine  the  rate  of  decision  agreement  among  examiners  on 
polygraph  test  charts.  Interrater  reliability  denotes  agreement  among  examiners, 
whereas  intrarater  agreement  (test-retest  reliability)  pertains  to  an  examiner  agreement 
with  his  or  her  own  decisions  when  evaluating  the  charts  on  different  occasions. 
Reliability  is  not  the  same  as  validity,  which  means  accuracy.  A  technique  cannot  be 
more  valid  than  it  is  reliable.  A  technique  can  have  high  agreement  without  high 
accuracy,  though  the  reverse  is  not  true. 

screening  examination;  Multiple-issue  polygraph  testing  of  applicants, 
employees  or  persons  with  U.S.  security  clearances.  Testing  formats  vary  and  can 
include  the  Relevant/  Irrelevant,  ZCT  “exploratory,”  Test  for  Espionage  and  Sabotage, 
Modified  General  Question  Test,  and  others.  The  strength  of  screening  examinations  is 
in  their  utility  to  develop  significant  information  that  is  most  often  not  obtainable  from 
any  other  source.  Its  weakness  is  that  it  is  not  as  powerful  an  examination  as  is  the 
specific  issue  test  in  terms  of  validity  and  reliability. 

sensitivity;  Ability  of  a  test  to  detect  specific  features  at  all  levels  of  magnitude 
or  prevalence.  If  a  deception  test,  for  example,  could  detect  physiologic  responses  that 
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accompany  deception  only  when  they  are  dramatie,  it  is  not  a  sensitive  test.  Conversely, 
if  it  enjoyed  perfect  efficiency  in  detecting  deception,  but  misclassified  a  high 
pereentage  of  truthful  examinees  as  deceptive,  it  would  have  sensitivity  but  poor 
specificity.  Sensitivity  and  specificity  are  two  dimensions  that  define  the  validity  of  the 
test. 


Signal  Detection  Theory  (SDT);  Theory  which  explains  how  to  make  “optimal” 
decisions  when  ground  truth  is  unknown.  If  the  benefits  and  eosts  of  true  positives, 
false  positives,  true  negatives,  and  false  negatives  of  a  polygraph  decision  can  be 
determined,  and  the  probabilities  of  each  event  ean  be  estimated,  then  signal  detection 
theory  ean  be  used  to  determine  the  optimal  cut  scores  over  many  decisions.  SDT  is  a 
rational  approach  to  establishing  cutting  scores  in  polygraphy. 

signal  value;  The  pereeived  signifieance  of  a  stimulus  for  a  subject.  External 
signifieance  is  assigned  to  a  question  when  it  appears  to  differ  from  others  based  on 
appearanee  (e.g.,  is  mueh  longer  or  is  read  in  a  louder  tone  of  voice).  Internal 
signifieance  is  assigned  to  a  question  beeause  of  the  subject’s  perception  of  the 
question’s  seope  or  eontent.  The  objective  of  a  CQT  examination  is  to  make  the  external 
signifieance  of  relevant  and  comparison  questions  appear  equal,  and  for  their  internal 
significance  to  vary.  An  innoeent  examinee  would  be  expected  to  generate  higher 
internal  significanee  for  the  comparison  questions,  whereas  the  relevant  questions  would 
hold  higher  internal  significanee  for  the  deceptive. 

skin  conductance  (SC);  Broad  term  for  two  exosomatic  electrodermal 
phenomena,  skin  conductance  level  and  skin  conductance  response.  In  recent  years 
polygraph  instrumentation  has  moved  away  from  skin  resistanee  measures  toward 
skin  conduetance  beeause  SC  measurement  has  less  impact  on  the  underlying 
mechanisms  that  produce  the  effect. 

specificity;  A  term  most  used  in  the  scientific  literature  to  deseribe  the 
selectivity  of  a  test.  It  is  the  ability  of  a  test  to  separate  one  element  from  among  many 
elements  regardless  of  their  similarity.  The  specifieity  of  a  test  will  determine  its 
efficiency.  If  a  polygraph  test  can  detect  deception  100%  of  the  time,  but  elassifies 
artifacts  such  as  PVCs  and  coughs  as  deception,  it  does  not  have  good  specificity. 
Specificity  and  sensitivity  constrain  the  validity  of  a  test. 

specific  issue  polygraph  examination;  A  single-issue  polygraph  examination, 
almost  always  administered  in  conjunetion  with  a  criminal  investigation,  and  usually 
addresses  a  single  issue.  Sometimes  ealled  a  specific  y  polygraph  practitioners  to 
differentiate  from  pre-employment  or  periodie  testing. 
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sphygmograph;  An  instrument  for  graphically  recording  arterial  pulse  and  blood 
pressure.  A  more  precise  term  for  the  cardiograph  channel  in  polygraph. 

spot:  A  permanently  assigned  location  of  a  relevant  question  in  a  CQT  question 

series. 


spot  analysis:  The  numerical  evaluation  of  a  relevant  question  by  comparing  it  to 
a  comparison  question  no  further  than  one  position  to  the  left  or  right  of  that  spot 
location.  A  “spot”  represents  the  location  of  a  relevant  question  in  a  question  series; 
the  physiological  data  at  the  relevant  question  (spot)  are  compared  with  those  of  an 
adjacent  comparison  question. 

standard  deviation:  Statistical  term  for  a  standardized  unit  of  dispersion  of 
scores.  When  scores  are  clustered  closely  together,  the  standard  deviation  is  small, 
whereas  a  wide  spread  would  have  a  larger  standard  deviation.  Mathematically,  the 
standard  deviation  is  the  square  root  of  the  variance.  Conceptually,  the  standard 
deviation  is  the  square  root  of  the  average  squared  deviation  from  the  mean. 

statistic:  A  measurement  of  a  sample.  There  are  several  ways  to  measure 
samples,  including  the  mean,  standard  deviation,  and  variance.  When  these 
measurements  are  taken  from  an  entire  population  they  are  referred  to  as  parameters. 

statistical  significance:  Phrase  to  describe  an  experimental  result  that  is  unlikely 
to  have  occurred  by  chance.  Conventional  probability  thresholds  of  statistical 
significance  are  0.05  and  0.01. 

symptomatic  question:  A  question  type  developed  by  Cleve  Backster  which  is 
used  to  identity  whether  or  not  an  examinee  is  fearful  the  examiner  will  ask  an 
unreviewed  question  embracing  an  outside  issue  that  is  bothering  the  examinee.  This 
mistrust  of  the  examiner  will  putatively  dampen  the  examinee’s  responses  to  other  test 
questions.  Symptomatic  questions  are  widely  used,  though  the  trend  in  the  research  is 
that  there  is  no  meaningful  effect. 

test:  In  polygraph,  the  test  is  frequently  used  to  differentiate  a  single  running 
of  a  question  series  (sometimes  also  called  a  chart)  during  physiological  recording  from 
the  examination,  which  is  considered  to  be  the  totality  of  the  polygraph  process.  It  can 
also  refer  to  specialized  procedures  within  techniques,  such  as  the  Yes  Test  and 
stimulation  test.  Test  has  been  used  to  refer  to  polygraph  techniques,  such  as  the  Zone 
Comparison  Test,  or  Modified  General  Question  Test.  The  specific  meaning  of  test  can 
depend  on  the  context. 

true  negative:  Correct  decision  that  the  variable  of  interest  is  not  present  (i.e., 
an  accurate  polygraph  outcome  of  innocence). 
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true  positive:  Correct  decision  that  the  variable  of  interest  is  present  (i.e., 
an  accurate  polygraph  outcome  of  guilt). 

validity:  Accuracy.  There  are  several  types  of  validity.  The  degree  to  which  a 
test  measures  what  it  professes  to  measure  is  construct  validity.  External  validity  relates 
to  the  generalizability  of  the  research  results  out  of  the  laboratory.  While  there  are 
other  types  of  validity  as  well,  these  two  types  go  to  the  heart  of  research  in  polygraph. 

You  Phase:  The  strongest  and  one  of  the  most  commonly  used  of  the  formats  in 
the  Backster  Zone  Comparison  Technique.  The  standardized  test  addresses  a  single 
issue  and  single  degree  of  involvement  in  the  issue.  The  format  provides  for  two  or  three 
relevant  questions,  worded  slightly  differently  from  one  another,  addressing  the  single 
issue  and  degree  of  involvement.  It  also  requires  a  repeat  of  the  relevant  question 
wording  in  the  sacrifice  relevant  question.  The  You  Phase  ZCT  is  a  very  powerful 
test  because  it  is  so  highly  focused  on  essentially  one  question.  An  example  of  You 
Phase  question  wording  is:  sacrifice  relevant — “Regarding  whether  or  not  you  shot 
Henry  Jones,  do  you  intend  to  answer  truthfully  each  question  about  that?”;  relevant  1 — 
“Did  you  shoot  Henry  Jones?”;  relevant  2 — “Did  you  fire  the  shot  that  caused  the  death 
of  Henry  Jones?”;  relevant  3 —  “Last  Friday  night,  did  you  shoot  Henry  Jones?” 

Zone:  Concept  coined  by  Cleve  Backster.  A  zone  is  a  twenty  to  thirty-five 
second  block  of  polygraph  chart  time  initiated  by  a  question  having  a  unique 
psychological  focusing  appeal  to  a  predictable  group  of  examinees.  In  his  ZCT, 
Backster  used  color-coding  to  identify  the  three  zones  in  the  ZCT:  red,  green,  and 
black.  Respectively,  the  red  zone  for  relevant  questions,  the  green  zone  for  comparison 
questions,  and  the  black  zone  for  symptomatic  questions. 

Zone  Comparison  Technique  (ZCT):  Polygraph  technique  developed  by  Cleve 
Backster  that  contains  three  Zones  (black,  red,  green),  with  comparison  of  responses 
between  two  of  the  Zones  (red  and  green)  for  a  determination  of  truth  or  deception.  The 
ZCT  is  designed  to  pose  a  threat  to  the  well-being  of  examinees,  regardless  of  their 
innocence  or  guilt,  and  compel  them  to  focus  their  attention  on  a  specific  zone 
question(s).  There  are  several  varieties,  including  the  “You  phase,”  “Exploratory”  “S- 
K-Y”  “DoDPI,”  and  the  “Utah.”  The  ZCT  was  the  first  modern  polygraph  technique 
in  general  use  to  incorporate  numerical  analysis.  The  ZCT  is  probably  used  more 
often  in  forensic  applications  than  any  other  format. 
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